我需要获取一个dataframes列表,并将cor()应用于每个列中的相同两个列,并返回一个相关值列表。到目前为止,我的职能如下:
corr <- function(directory, threshold = 0){
#reads directory of files
file_list <- list.files(path = "C:/Users/jonah/Documents/R work/R Coursera Course/specdata")
# takes file_list and makes each file into dataframe
dflist <- lapply(file_list, read.csv)
# returns list of files, na rows stripped
nolist <- lapply(dflist, na.omit)
# removes all with nrows < threshold
abovelist <- nolist[sapply(nolist, function(x) nrow(x) > threshold)]
# runs correlation of nitrate, sulfate on remaining
correlations <- lapply(abovelist, cor(abovelist$sulfate, abovelist$nitrate))
}每个dataframes有四列:日期、硫酸盐数量、硝酸盐数量和ID列。我只关心硫酸盐和硝酸盐(以及它们之间的相关性)。如何设置来处理这些列?
提前谢谢你。
每一个
发布于 2020-09-22 23:55:01
您可以在lapply中使用匿名函数来引用对象,就像对sapply一样。
试试这个:
corr <- function(directory, threshold = 0){
file_list <- list.files(path = directory)
dflist <- lapply(file_list, function(x) na.omit(read.csv(x)))
abovelist <- dflist[sapply(dflist, nrow) > threshold]
correlations <- lapply(abovelist, function(x) cor(x$sulfate, x$nitrate))
return(correlations)
}并称之为:
corr("C:/Users/jonah/Documents/R work/R Coursera Course/specdata")https://stackoverflow.com/questions/64019107
复制相似问题