文章/答案/技术大牛

发布

社区首页 >问答首页 >使用7个特征的R- PCA

问使用7个特征的R- PCA
EN

Stack Overflow用户

提问于 2015-09-22 18:03:30

回答 1查看 868关注 0票数 1

我有一个简单的预测。我有12个可能的特征。在发现大多数方差被其中的7个变量捕获后-我使用了caret包中的preProcess -我想只使用这7个变量创建一个线性模型lm。

我运行了preProcess

pp <- preProcess(tr_1,thresh = 0.8,method = "pca")

结果就是PCA needed 7 components to capture 80 percent of the variance

问题是如何仅使用这7个功能来运行模型/预测。

谢谢

preprocessor

r-caret

回答 1

Stack Overflow用户

发布于 2015-09-23 09:12:39

下面是一个关于如何选择特定数量的PCA components的完整示例。您需要在preProcess中设置pcaComp = 7或使用thresh = 0.8，然后将您的处理应用于训练和测试数据，如下所示。?preProcess提供了更多详细信息。如果您想使用带有train方法的PCA来优化模型，请阅读我在此post中对类似问题的回答。请记住，如果您有categorical variables (factors)，您需要首先将它们转换为dummy variables，然后才能应用您的处理(中心、缩放、主成分分析等)。有关创建dummy variables的更多详细信息，请阅读caret网站上的this。

library(caret)
library(MASS)#for the Boston dataset
data(Boston)

#number of samples and predictors (including the outcome)
dim(Boston)
#predictors names (medv is the response)
names(Boston)

#you can find more about the Boston Dataset
?Boston

#Let's split the the data to train and test sets
set.seed(10457)
train_idx <- createDataPartition(Boston$medv, p = 0.75, list = FALSE)

train <- Boston[train_idx,]
test <- Boston[-train_idx,]

#Now using preProcess, you need to set the pcaComp = 7, or thresh = 0.8
#you may need to center and scale first and then apply PCA
#or just use method = c("pca")

#create the preProc object, remember to exclude the response (medv)
preProc  <- preProcess(train[,-14], 
                       method = c("center", "scale", "pca"),
                       pcaComp = 7) # or thresh = 0.8
#Apply the processing to the train and test data, and add the response 
#to the dataframes
train_pca <- predict(preProc, train[,-14])
train_pca$medv <- train$medv
test_pca <- predict(preProc, test[,-14])
test_pca$medv <- test$medv

#you can verify the 7 comp
> head(train_pca)
        PC1          PC2         PC3        PC4        PC5        PC6       PC7 medv
1 -2.063576  0.784975586  0.42188132 -0.4674029 -0.9208095 -0.1561148  0.2940533 24.0
2 -1.411319  0.605782852 -0.62260611  0.2258748 -0.4840448  0.3235172 0.5061220 21.6
3 -2.052144  0.514495591  0.18221545  0.9539644 -0.8148428  0.4832016 0.3699110 34.7
4 -2.596799 -0.068710981 -0.10115928  1.1308079 -0.4056899  0.6759937 0.4954385 33.4
5 -2.435048  0.032030728 -0.06201039  1.1046487 -0.5043492  0.6176695 0.5808873 36.2
6 -2.187428 -0.007289459 -0.63593163  0.6597568 -0.1828520  0.6043359 0.5659098 28.7

#Now fit your lm model, something like
fit <- lm(medv~., data = train_pca)

> fit$coefficients
(Intercept)         PC1         PC2         PC3         PC4         PC5         PC6         PC7 
 22.3524934  -2.2357451   1.5531484   3.2346456   2.3612132  -1.7321590  -0.4438279  -0.2850688

顺便说一句，下次当你问问题时，试着发布一个可重现的例子(代码+数据)，这样人们就可以理解问题并帮助你。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/32714063

复制

相似问题

问使用7个特征的R- PCA
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用7个特征的R- PCAEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用7个特征的R- PCA
EN