Every tutorial you read shows a different way to import Statsmodels. One guide starts with import statsmodels.api as sm. Another uses from statsmodels.formula.api import ols. A third imports directly from submodules like from statsmodels.regression.linear_model import OLS. Which approach should you use?
The confusion stems from a deliberate design choice. Statsmodels offers multiple import paths because different users need different things. Researchers writing academic papers want one workflow. Data scientists doing quick exploratory analysis want another. Understanding these three approaches will save you from blindly copying code that doesn’t match your actual needs.
Statsmodel Beginner’s Learning Path
Understanding the Three Import Approaches
Approach 1: statsmodels.api
import statsmodels.api as sm
model = sm.OLS(endog, exog)
results = model.fit()
The statsmodels.api module serves as your main gateway to the library. When you import sm, you get access to the most commonly used models and functions through a clean namespace. Ordinary Least Squares becomes sm.OLS. Logistic regression becomes sm.Logit. The add_constant function becomes sm.add_constant.
Approach 2: statsmodels.formula.api
import statsmodels.formula.api as smf
model = smf.ols('salary ~ experience + education', data=df)
results = model.fit()
The statsmodels.formula.api module gives you R-style formula syntax. Instead of manually separating your endog and exog variables, you write a formula string that describes the relationship. The lowercase function names (ols instead of OLS) signal that you’re using the formula interface.
Approach 3: Direct submodule imports
from statsmodels.regression.linear_model import OLS
from statsmodels.tools.tools import add_constant
model = OLS(endog, exog)
results = model.fit()
Direct imports pull specific classes or functions from their exact location in the library structure. You import only what you need, nothing more.
When to Use statsmodels.api for Production Code
The statsmodels.api approach works best when you’re writing reusable functions or production code that needs explicit control over data preparation.
Production data pipeline example:
import statsmodels.api as sm
import pandas as pd
def fit_salary_model(data, predictor_columns, target_column):
"""
Fits OLS regression for salary prediction.
Handles data preparation and model fitting.
"""
endog = data[target_column]
exog = data[predictor_columns]
exog = sm.add_constant(exog)
model = sm.OLS(endog, exog)
return model.fit()
Your function receives clean DataFrames, explicitly constructs endog and exog, adds the constant manually, and fits the model. Every step is visible and controllable. When something breaks in production, you can trace exactly where the problem occurred.
Key advantages for production work:
- Explicit data handling makes debugging straightforward
- Works seamlessly with existing data preprocessing pipelines
- Gives you precise control over what goes into the model
- Easier to write unit tests for each step
- Compatible with scikit-learn patterns if you’re mixing libraries
When to Use statsmodels.formula.api for Exploratory Analysis
The formula interface shines during exploratory data analysis when you want to test relationships quickly without writing data preparation code.
Exploratory analysis example:
import statsmodels.formula.api as smf
import pandas as pd
# Test different model specifications rapidly
model1 = smf.ols('price ~ square_feet', data=housing_df).fit()
model2 = smf.ols('price ~ square_feet + bedrooms', data=housing_df).fit()
model3 = smf.ols('price ~ square_feet + bedrooms + square_feet:bedrooms', data=housing_df).fit()
print(model1.summary())
print(model2.summary())
print(model3.summary())
You iterate through different specifications by changing one string. No need to recreate DataFrames or add constants. The formula syntax handles transformations and interactions directly in the formula string.
Key advantages for exploration:
- Rapid model iteration without data manipulation
- Automatic handling of categorical variables
- Built-in support for interactions and transformations
- Formula strings serve as documentation of model specification
- Closer to how you’d write models in academic papers
Important gotcha: Formula syntax automatically adds a constant unless you explicitly exclude it with -1 or +0 in the formula. The opposite behavior from the standard api.
When to Use Direct Imports for Specialized Models
Direct imports make sense when you’re working with less common models or need access to specific functionality that isn’t exposed through the main api.
Time series analysis example:
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
# Testing for stationarity
result = adfuller(time_series_data)
# Fitting ARIMA model
model = ARIMA(time_series_data, order=(1,1,1))
fitted_model = model.fit()
Some specialized functions live deep in the module hierarchy. Importing them directly avoids typing long chains like sm.tsa.stattools.adfuller every time you use them.
Key advantages for specialized work:
- Access to functions not in the main api
- Clearer code when using many functions from one submodule
- Faster imports (you only load what you need)
- Better IDE autocomplete for specific model classes
Practical Decision Framework for Import Selection
Choose statsmodels.api when:
- Building production pipelines or reusable functions
- Working with preprocessed NumPy arrays or cleaned DataFrames
- Mixing Statsmodels with scikit-learn in the same workflow
- Writing code that others will maintain and debug
- Needing explicit control over every transformation
Choose statsmodels.formula.api when:
- Doing exploratory data analysis with raw DataFrames
- Testing multiple model specifications quickly
- Working with categorical variables that need automatic encoding
- Model specifications match academic or research papers
- Speed of iteration matters more than explicit control
Choose direct imports when:
- Working with specialized models not in the main api
- Using multiple functions from a single submodule
- Writing focused scripts with narrow scope
- Optimizing import time for large codebases
- Following existing codebase conventions
Common Import Patterns and Best Practices
Pattern 1: Mixed approach for flexibility
import statsmodels.api as sm
import statsmodels.formula.api as smf
# Use formula for quick tests
quick_model = smf.ols('sales ~ marketing_spend', data=df).fit()
# Use standard api for production version
endog = df['sales']
exog = sm.add_constant(df[['marketing_spend']])
production_model = sm.OLS(endog, exog).fit()
Nothing stops you from using both approaches in the same project. Use formulas for exploration, then rewrite using the standard api once you know what works.
Pattern 2: Namespace clarity
import statsmodels.api as sm
from statsmodels.graphics import tsaplots
# Clear which module each function comes from
model = sm.OLS(endog, exog).fit()
tsaplots.plot_acf(residuals)
Using sm for the main api and explicit names for specialized modules keeps your code readable.
Pattern 3: Selective exposure in modules
# In your analysis_utils.py module
from statsmodels.api import OLS, add_constant, Logit
def fit_linear_model(data, target, features):
endog = data[target]
exog = add_constant(data[features])
return OLS(endog, exog).fit()
Your utility module imports specific classes, making them available without the sm prefix. Users of your module don’t need to know where these came from.
Handling Import Compatibility Across Versions
Statsmodels maintains backward compatibility carefully, but the import paths you choose affect how future-proof your code is.
The main statsmodels.api interface is the most stable. Functions and classes exposed here rarely change or move. Direct imports from submodules are more likely to shift between versions as the library refactors internal organization.
Safer for long-term maintenance:
import statsmodels.api as sm
model = sm.OLS(endog, exog)
More fragile across versions:
from statsmodels.regression.linear_model import OLS
The second approach works perfectly fine, but you might need to update import statements when upgrading Statsmodels versions. The first approach is more likely to work without changes.
Avoiding Common Import Mistakes
Mistake 1: Mixing lowercase and uppercase
# Wrong - mixing formula and standard api
import statsmodels.api as sm
model = sm.ols('y ~ x', data=df) # ols doesn't exist in api
Lowercase function names belong to formula.api. Standard api uses uppercase class names.
Mistake 2: Forgetting the constant with standard api
# Missing constant term
import statsmodels.api as sm
model = sm.OLS(endog, exog).fit()
Formula api adds constants automatically. Standard api requires explicit sm.add_constant() calls.
Mistake 3: Importing everything
# Avoid this - clutters namespace
from statsmodels.api import *
Explicit imports make code more maintainable. Future readers (including you) can see exactly what’s being used.
Practical Import Strategy for Real Projects
Here’s a realistic workflow that balances convenience with maintainability:
Stage 1: Exploratory notebooks
import statsmodels.formula.api as smf
# Use formula syntax for rapid experimentation
Stage 2: Prototype scripts
import statsmodels.api as sm
# Convert promising models to standard api
Stage 3: Production code
from statsmodels.api import OLS, add_constant
# Import specific classes for clean, focused modules
Your import strategy evolves as code matures from exploration to production. Starting with formulas speeds up discovery. Ending with explicit imports creates maintainable systems.
Making Your Choice
No single import approach is universally better. The right choice depends on your context: Are you exploring data or building production systems? Are you working alone or with a team? Are you writing throwaway scripts or long-lived applications?
Understanding all three approaches gives you flexibility. You can match your import style to your actual needs rather than copying whatever pattern you saw first. The goal isn’t to pick one approach and use it forever. The goal is to understand when each approach makes your work easier and your code clearer.

