文章/答案/技术大牛

发布

社区首页 >问答首页 >[Statsmodels]：如何让statsmodel返回OLS对象的pvalue？

问[Statsmodels]：如何让statsmodel返回OLS对象的pvalue？
EN

Stack Overflow用户

提问于 2017-08-18 00:50:54

回答 2查看 5.5K关注 0票数 2

我对编程非常陌生，我正在使用python来熟悉数据分析和机器学习。

我正在学习一个关于多元线性回归后向消除的教程。下面是现在的代码：

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values

#Taking care of missin' data
#np.set_printoptions(threshold=100) 
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3]) 

#Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelEncoder_X = LabelEncoder()
X[:, 3] = labelEncoder_X.fit_transform(X[:, 3])
onehotecnoder = OneHotEncoder(categorical_features = [3])
X = onehotecnoder.fit_transform(X).toarray()

#Avoid the Dummy Variables Trap
X = X[:, 1:]

#Splitting data in train and test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

#Fitting multiple Linear Regression to Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

#Predict Test set
regressor = regressor.predict(X_test)

#Building the optimal model using Backward Elimination
import statsmodels.formula.api as sm
a = 0
b = 0
a, b = X.shape
X = np.append(arr = np.ones((a, 1)).astype(int), values = X, axis = 1)
print (X.shape)

X_optimal = X[:,[0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_optimal).fit()
regressor_OLS.summary()
X_optimal = X[:,[0,1,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_optimal).fit()
regressor_OLS.summary()
X_optimal = X[:,[0,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_optimal).fit()
regressor_OLS.summary()
X_optimal = X[:,[0,3,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_optimal).fit()
regressor_OLS.summary()
X_optimal = X[:,[0,3]]
regressor_OLS = sm.OLS(endog = y, exog = X_optimal).fit()
regressor_OLS.summary()

现在，执行消除的方式对我来说似乎真的是手动的，我想让它自动化。为了做到这一点，我想知道是否有一种方法可以让我以某种方式返回回归变量的p值(例如，在statsmodel中是否有这样做的方法)。通过这种方式，我认为我应该能够循环X_optimal数组的特征，并查看pvalue是否大于我的SL并消除它。

谢谢!

python

machine-learning

data-analysis

statsmodels

回答 2

Stack Overflow用户

发布于 2018-02-02 07:10:14

遇到了同样的问题。

您可以通过以下方式访问p值

regressor_OLS.pvalues

它们以科学记数法存储为float64s数组。我对python还是个新手，我相信有更干净、更优雅的解决方案，但这是我的：

sigLevel = 0.05

X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()
pVals = regressor_OLS.pvalues

while np.argmax(pVals) > sigLevel:
    droppedDimIndex = np.argmax(regressor_OLS.pvalues)
    keptDims = list(range(len(X_opt[0])))
    keptDims.pop(droppedDimIndex)
    print("pval of dim removed: " + str(np.argmax(pVals)))
    X_opt = X_opt[:,keptDims]
    regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
    pVals = regressor_OLS.pvalues
    print(str(len(pVals)-1) + " dimensions remaining...")
    print(pVals)

regressor_OLS.summary()

票数 4

Stack Overflow用户

发布于 2018-07-17 10:26:03

感谢Keith的回答，只是对Keith的循环进行了一些小的修复，使其更有效率：

sigLevel = 0.05
X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
pVals = regressor_OLS.pvalues

while pVals[np.argmax(pVals)] > sigLevel:
     X_opt = np.delete(X_opt, np.argmax(pVals), axis = 1)
     print("pval of dim removed: " + str(np.argmax(pVals)))
     print(str(X_opt.shape[1]) + " dimensions remaining...")
     regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
     pVals = regressor_OLS.pvalues

regressor_OLS.summary()

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45740920

复制

相似问题

问[Statsmodels]：如何让statsmodel返回OLS对象的pvalue？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问[Statsmodels]：如何让statsmodel返回OLS对象的pvalue？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问[Statsmodels]：如何让statsmodel返回OLS对象的pvalue？
EN