文章/答案/技术大牛

发布

社区首页 >问答首页 >(皮尔逊)相关循环遍历数据帧

问(皮尔逊)相关循环遍历数据帧
EN

Stack Overflow用户

提问于 2020-03-22 00:54:35

回答 1查看 486关注 0票数 0

我有一个具有159个obs和27个变量的数据帧，我想将第4列(变量4)中的所有159个obs与以下每一列(变量)关联，即，将列4与5关联，然后将列4与6关联，依此类推……我一直在尝试创建一个循环，但没有成功，因为我是R的初学者，所以它比我想象的要难。我想让它变得更简单的原因是，我需要对更多的数据帧做同样的事情，如果我有一个函数可以做到这一点，它将变得更容易和更少的时间。因此，如果有人能帮助我，那就太好了。

 df <- ZEB1_23genes # CHANGE ZEB1_23genes for df (dataframe)

  for (i in colnames(df)){      # Check the class of the variables
         print(class(df[[i]]))
  }

print(df)

# Correlate ZEB1 with each of the 23 genes accordingly to Pearson's method


cor.test(df$ZEB1, df$PITPNC1, method = "pearson")
### OR ###
cor.test(df[,4], df[,5])

因此，我可以单独关联，但不能创建循环返回到第4列并将其关联到下一列(5，6，...，27)。

谢谢!

loops

dataframe

pearson-correlation

pearson

回答 1

Stack Overflow用户

发布于 2020-03-22 02:27:51

如果我没理解错你的问题，下面的解决方案应该可以很好地工作。

#Sample data
df <- data.frame(matrix(data = sample(runif(100000), 4293), nrow = 159, ncol = 27))

#Correlation function
#Takes data.frame contains columns with values to be correlated as input
#The column against which other columns must be correlated cab be specified (start_col; default is 4)
#The number of columns to be correlated against start_col can also be specified (end_col; default is all columns after start_col)
#Function returns a data.frame containing start_col, end_col, and correlation value as rows.

my_correlator <- function(mydf, start_col = 4, end_col = 0){
    if(end_col == 0){
    end_col <- ncol(mydf)
  }
  #out_corr_df <- data.frame(start_col = c(), end_col = c(), corr_val = c())
  out_corr <- list()
  for(i in (start_col+1):end_col){
    out_corr[[i]] <- data.frame(start_col = start_col, end_col = i, corr_val = as.numeric(cor.test(mydf[, start_col], mydf[, i])$estimate))
  }
  return(do.call("rbind", out_corr))
}

test_run <- my_correlator(df, 4)

head(test_run)

#   start_col end_col     corr_val
# 1         4       5 -0.027508521
# 2         4       6  0.100414199
# 3         4       7  0.036648608
# 4         4       8 -0.050845418
# 5         4       9 -0.003625019
# 6         4      10 -0.058172227

该函数基本上接受一个data.frame作为输入，并输出(作为输出)另一个data.frame，其中包含来自原始data.frame的给定列与所有后续列之间的相关性。我不知道您的数据的结构，显然，如果遇到意外情况(例如，其中一列中的一列字符)，此函数将失败。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60790599

复制

相似问题

问(皮尔逊)相关循环遍历数据帧
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问(皮尔逊)相关循环遍历数据帧EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问(皮尔逊)相关循环遍历数据帧
EN