Installing Statsmodels takes just a few commands, but the process varies slightly depending on your operating system and Python setup. The library supports Python 3.9 through 3.14, so you’ll need one of these versions installed before starting.

I recommend using pip for most installations. Conda works well if you’re managing complex scientific computing environments. Both methods handle dependencies automatically, installing NumPy, SciPy, Pandas, and Patsy alongside Statsmodels.

What you need before installing

Your system needs Python 3.9 or newer. Check your version by opening a terminal and running:

You should see something like Python 3.12.3 or similar. If your version is older than 3.9, upgrade Python first.

You also need pip (Python’s package installer) or conda (if you’re using Anaconda). Most Python installations include pip by default. Verify it’s installed:

Installing Statsmodels with pip

The simplest installation method uses pip. Open your terminal and run:

This installs the latest stable version (currently 0.14.5) along with all required dependencies. The process takes a minute or two depending on your internet connection and system speed.

For Python 3 systems with both Python 2 and 3 installed:

Use pip3 instead to avoid conflicts:

Installing a specific version:

If you need a particular version for compatibility reasons:

pip install statsmodels==0.14.4

Installing with Conda

Conda users should install from the conda-forge channel, which maintains the most up-to-date builds:

conda install -c conda-forge statsmodels

Conda automatically resolves dependencies and ensures compatibility with your existing packages. I find this approach more reliable when you’re working with multiple scientific computing libraries that might have conflicting requirements.

Platform-specific considerations

Each operating system has quirks that affect installation, particularly when building from source or dealing with compiled components.

Windows Installation

Windows users rarely encounter issues with pip installation since pre-built wheels are available for all recent Python versions. The installation command is the same:

If you’re using Anaconda on Windows:

Anaconda’s integrated environment handles everything:

conda install -c conda-forge statsmodels

Troubleshooting Windows builds:

If you need to build from source (which you usually don’t), you’ll need Microsoft Visual C++ 14.0 or greater. Get it from “Microsoft C++ Build Tools.” Most users never need this since pre-built wheels handle the compilation.

MacOS Installation

MacOS users can install directly with pip:

For Apple Silicon (M1/M2/M3) Macs:

Pre-built wheels are available for ARM64 architecture. The standard pip command works:

The library runs natively on Apple Silicon without Rosetta translation.

If you need to build from source:

Install Xcode Command Line Tools first:

This provides the C compiler needed for building compiled components. Then install normally with pip.

Linux Installation

Most Linux distributions come with Python and pip pre-installed. Install Statsmodels with:

For Debian/Ubuntu systems:

If pip isn’t installed:

sudo apt update
sudo apt install python3-pip
pip3 install statsmodels

For Red Hat/Fedora/CentOS:

sudo dnf install python3-pip
pip3 install statsmodels

Building from source on Linux:

You’ll need gcc, which is typically already installed. Verify with:

If it’s missing, install the build essentials:

# Debian/Ubuntu
sudo apt install build-essential

# Red Hat/Fedora
sudo dnf groupinstall "Development Tools"

Installing in a virtual environment

I always recommend using virtual environments to avoid dependency conflicts between projects. Here’s how to set one up and install Statsmodels inside it.

Creating and activating a virtual environment:

# Create the environment
python -m venv statsmodels_env

# Activate on Windows
statsmodels_env\Scripts\activate

# Activate on MacOS/Linux
source statsmodels_env/bin/activate

Your terminal prompt should change to show (statsmodels_env) indicating you’re in the virtual environment.

Install Statsmodels in the environment:

Deactivate when you’re done:

Verifying your installation

After installation, confirm Statsmodels works correctly. Open a Python interpreter:

Then run:

import statsmodels.api as sm
print(sm.__version__)

You should see the version number (like 0.14.5) printed. If you get an error, the installation didn’t complete successfully.

Test with a simple regression:

import statsmodels.api as sm
import numpy as np

# Generate sample data
X = np.random.rand(100, 2)
y = np.random.rand(100)

# Add constant and fit model
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()

print(model.summary())

If this runs without errors and produces a regression summary table, your installation is working correctly.

Understanding dependencies

Statsmodels installs several other packages automatically. Here’s what gets included and why:

NumPy (≥ 1.18.0): Provides array operations and numerical computing foundations

SciPy (≥ 1.4.0): Supplies scientific computing functions, optimization algorithms, and special functions

Pandas (≥ 1.0.0): Enables DataFrame support and data manipulation

Patsy (≥ 0.5.0): Handles formula parsing for R-style model specifications

These versions represent minimum requirements. Your installation will use whatever compatible versions are already installed or fetch the latest ones if you’re starting fresh.

Installing optional dependencies

Some Statsmodels features require additional packages that aren’t installed by default.

For plotting and visualization:

Matplotlib is needed for diagnostic plots, regression visualizations, and many examples in the documentation.

For regularized models:

Required if you’re using regularized regression methods like LASSO, Ridge, or Elastic Net.

For Jupyter notebook support:

pip install jupyter ipython

Useful for interactive statistical analysis and following Statsmodels tutorials.

For enhanced optimization:

Improves numerical derivatives in some advanced models.

Installing the development version

The GitHub repository usually contains bug fixes and features before they appear in stable releases. If you need cutting-edge functionality:

pip install git+https://github.com/statsmodels/statsmodels

This requires git to be installed on your system. The installation takes longer since it builds from source.

For development work:

Clone the repository and install in editable mode:

git clone https://github.com/statsmodels/statsmodels.git
cd statsmodels
pip install -e .

This links your local repository to your Python environment. Changes you make to the source code immediately affect imports without reinstalling.

Troubleshooting common installation issues

“ModuleNotFoundError: No module named ‘statsmodels’”

You installed Statsmodels in one Python environment but are running code in a different one. Check which Python your IDE is using:

import sys
print(sys.executable)

Install Statsmodels using the pip associated with that Python:

/path/to/python -m pip install statsmodels

“Microsoft Visual C++ 14.0 is required” (Windows)

You’re trying to build from source but lack the compiler. Install pre-built wheels instead:

pip install --only-binary :all: statsmodels

Or get the Visual C++ Build Tools if you specifically need to build from source.

“gcc: command not found” (Linux)

Your system needs a C compiler. Install build tools:

# Debian/Ubuntu
sudo apt install build-essential

# Red Hat/Fedora
sudo dnf groupinstall "Development Tools"

ImportError with NumPy or SciPy

Version conflicts between dependencies can cause import errors. Update everything:

pip install --upgrade statsmodels numpy scipy pandas

Installation hangs or times out

Try increasing the timeout or using a different mirror:

pip install --timeout 300 statsmodels

Upgrading Statsmodels

Keep your installation current to get bug fixes and new features:

pip install --upgrade statsmodels

With conda:

Check for updates regularly, especially if you encounter bugs that might already be fixed in newer versions.

Uninstalling Statsmodels

Remove the library when you no longer need it:

pip uninstall statsmodels

With conda:

This doesn’t affect other packages in your environment.

What comes next

Now that Statsmodels is installed, you’re ready to start building statistical models. The library includes dozens of built-in datasets for practice. Load one and fit your first model:

import statsmodels.api as sm

# Load a built-in dataset
data = sm.datasets.get_rdataset("mtcars", "datasets").data

# Fit a simple linear model
X = sm.add_constant(data['wt'])
y = data['mpg']
model = sm.OLS(y, X).fit()

print(model.summary())

This demonstrates the basic workflow: load data, specify a model, fit it, and examine results. From here, you can explore more complex models, time series analysis, and advanced statistical techniques that Statsmodels offers.

Share.
Leave A Reply