文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在R中使用lapply多次运行包含来自不同数据帧的变量的模型

问如何在R中使用lapply多次运行包含来自不同数据帧的变量的模型
EN

Stack Overflow用户

提问于 2021-04-22 07:25:41

回答 1查看 31关注 0票数 0

我有两个数据帧

#dummy df for examples:
set.seed(1)
df1 <- data.frame(t = (1:16), 
                 A = sample(20, 16),
                 B = sample(30, 16),
                 C = sample(30, 16))

df2 <- data.frame(t = (1:16),
                  A = sample(20, 16),
                  B = sample(30, 16),
                  C = sample(30, 16))

我想对两个数据帧中的每一列(除了t列)执行此操作：

model <- lm(df2$A ~ df1$A, data = NULL)

我尝试过这样的东西：

model <- function(yvar, xvar){
  lm(df1$as.name(yvar) ~ df2$as.name(xvar), data = NULL)
}
lapply(names(data), model)

但它显然不起作用。我做错了什么？

最后，我真正想要的是从模型中获得系数和其他东西。但是阻止我的是如何使用来自不同数据帧的变量多次运行线性模型。

我想要的输出我猜应该是这样的：

# [[1]]
# Call:
#   lm(df1$as.name(yvar) ~ df2$as.name(xvar), data = NULL)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -0.8809 -0.2318  0.1657  0.3787  0.5533 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)    -0.013981   0.169805  -0.082    0.936    
# predmodex[, 2]  1.000143   0.002357 424.351   <2e-16 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.4584 on 14 degrees of freedom
# Multiple R-squared:  0.9999,  Adjusted R-squared:  0.9999 
# F-statistic: 1.801e+05 on 1 and 14 DF,  p-value: < 2.2e-16
# 
# [[2]]
# Call:
#   lm(df1$as.name(yvar) ~ df2$as.name(xvar), data = NULL)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -0.8809 -0.2318  0.1657  0.3787  0.5533 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)    -0.013981   0.169805  -0.082    0.936    
# predmodex[, 2]  1.000143   0.002357 424.351   <2e-16 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.4584 on 14 degrees of freedom
# Multiple R-squared:  0.9999,  Adjusted R-squared:  0.9999 
# F-statistic: 1.801e+05 on 1 and 14 DF,  p-value: < 2.2e-16
# 
# [[3]]
# Call:
#   lm(df1$as.name(yvar) ~ df2$as.name(xvar), data = NULL)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -0.8809 -0.2318  0.1657  0.3787  0.5533 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)    -0.013981   0.169805  -0.082    0.936    
# predmodex[, 2]  1.000143   0.002357 424.351   <2e-16 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.4584 on 14 degrees of freedom
# Multiple R-squared:  0.9999,  Adjusted R-squared:  0.9999 
# F-statistic: 1.801e+05 on 1 and 14 DF,  p-value: < 2.2e-16

loops

lapply

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-04-22 09:33:12

由于df1和df2具有相同的名称，因此您可以这样做：

model <- function(var){
  lm(df1[[var]] ~ df2[[var]])
}
result <- lapply(names(df1)[-1], model)
result

#[[1]]

#Call:
#lm(formula = df1[[var]] ~ df2[[var]])

#Coefficients:
#(Intercept)   df2[[var]]  
#    15.1504      -0.4763  


#[[2]]

#Call:
#lm(formula = df1[[var]] ~ df2[[var]])

#Coefficients:
#(Intercept)   df2[[var]]  
#     3.0227       0.6374  


#[[3]]

#Call:
#lm(formula = df1[[var]] ~ df2[[var]])

#Coefficients:
#(Intercept)   df2[[var]]  
#    15.4240       0.2411

要从模型中获取汇总统计信息，您可以使用broom::tidy：

purrr::map_df(result, broom::tidy, .id = 'model_num')

#  model_num term        estimate std.error statistic  p.value
#  <chr>     <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#1 1         (Intercept)   15.2       3.03      5.00  0.000194
#2 1         df2[[var]]    -0.476     0.248    -1.92  0.0754  
#3 2         (Intercept)    3.02      4.09      0.739 0.472   
#4 2         df2[[var]]     0.637     0.227     2.81  0.0139  
#5 3         (Intercept)   15.4       4.40      3.50  0.00351 
#6 3         df2[[var]]     0.241     0.272     0.888 0.390

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67204706

复制

相似问题

问如何在R中使用lapply多次运行包含来自不同数据帧的变量的模型
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在R中使用lapply多次运行包含来自不同数据帧的变量的模型EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在R中使用lapply多次运行包含来自不同数据帧的变量的模型
EN