我在R中使用10个列表(files1,files2,files3,... files10)。每个列表包含多个数据帧。
现在,我想从每个列表中的每个数据帧中提取一些值。
我打算使用for循环。
nt = c("A", "C", "G", "T")
for (i in files1) {
for (j in nt) {
name = paste(j, i, sep = "-") # here I want as output name = "files1-A". However this doesn't work. How can I get the name of the list "files1"?
colname = paste("percentage", j, sep = "") # here I was as output colname = percentageA. This works
assign(name, unlist(lapply(i, function(x) x[here I want to use the column with the name "percentageA", so 'colname'][x$position==1000])))
}
}因此,我在使用列表的名称并将它们赋值给变量时遇到了麻烦。
我知道只遍历第一个列表,但是否也可以立即遍历我的所有列表?
换句话说:我如何将下面的代码放在for循环中?
A_files1 = unlist(lapply(files1, function(x) x$percentageA[x$position==1000]))
C_files1 = unlist(lapply(files1, function(x) x$percentageC[x$position==1000]))
G_files1 = unlist(lapply(files1, function(x) x$percentageG[x$position==1000]))
T_files1 = unlist(lapply(files1, function(x) x$percentageT[x$position==1000]))
A_files2 = unlist(lapply(files2, function(x) x$percentageA[x$position==1000]))
C_files2 = unlist(lapply(files2, function(x) x$percentageC[x$position==1000]))
G_files2 = unlist(lapply(files2, function(x) x$percentageG[x$position==1000]))
T_files2 = unlist(lapply(files2, function(x) x$percentageT[x$position==1000]))
....
A_files10 = unlist(lapply(files10, function(x) x$percentageA[x$position==1000]))
C_files10 = unlist(lapply(files10, function(x) x$percentageC[x$position==1000]))
G_files10 = unlist(lapply(files10, function(x) x$percentageG[x$position==1000]))
T_files10 = unlist(lapply(files10, function(x) x$percentageT[x$position==1000]))发布于 2016-12-29 19:39:47
为了回答你的问题,我创建了一个包含数据帧的假列表:
n = data.frame(andrea=c(1983, 11, 8),paja=c(1985, 4, 3))
s = data.frame(col1=c("aa", "bb", "cc", "dd", "ee"))
b = data.frame(col1=c(TRUE, FALSE, TRUE, FALSE, FALSE))
x = list(n, s, b, 3) # x contains copies of n, s, b
names(x) <- c("dataframe1","dataframe2","dataframe3","dataframe4")
files1 = x现在,输入在您的循环中发生的事情:
i = files1
j = "A"如果您希望数据帧的名称带有nt中包含的pedix (在本例中为nt = "A"),则必须使用names(i):
name_wrong = paste(j, i, sep = "-")
name = paste(names(i),j,sep = "-")因此,您可以获得:
> name
[1] "dataframe1-A" "dataframe2-A" "dataframe3-A" "dataframe4-A"我希望这是你所需要的。
发布于 2016-12-30 07:58:13
我认为如果将数据结构扁平化,这些数据将更容易操作。您可以使用一个数据框,而不是10个数据框列表,其中所有观测值都按其名称和文件名进行索引。
生成样本数据并使用问题中的代码
每个项目只有10或11个点的简化数据我认为列表中的项目具有不同的行数?
files1 <- list(item1 = data.frame(position = 1:10,
percentageA = 1:10/10,
percentageC = 1:10/10,
percentageG = 1:10/10,
percentageT = 1:10/10),
item2 = data.frame(position = 1:11,
percentageA = 1:11/20,
percentageC = 1:11/20,
percentageG = 1:11/20,
percentageT = 1:11/20))
str(file)
# Select the 9th position using your code
A_files1 = unlist(lapply(files1, function(x) x$percentageA[x$position==9]))
C_files1 = unlist(lapply(files1, function(x) x$percentageC[x$position==9]))
G_files1 = unlist(lapply(files1, function(x) x$percentageG[x$position==9]))
T_files1 = unlist(lapply(files1, function(x) x$percentageT[x$position==9]))将数据帧列表展平为一个数据帧
# Add name to each data frame
# Inspired by this answer
# http://stackoverflow.com/a/18434780/2641825
# For information l[1] creates a single list item
# l[[1]] extracts the data frame from the list
#' @param i index
#' @param listoffiles list of data frames
addname <- function(i, listoffiles){
dtf <- listoffiles[[i]] # Extract the dataframe from the list
dtf$name <- names(listoffiles[i]) # Add the name inside the data frame
return(dtf)
}
# Add the name inside each data frame
files1 <- lapply(seq_along(files1), addname, files1)
str(files1) # look at the structure of the list
files1table <- Reduce(rbind,files1)
# Get the values of interest with
files1table$percentageA[files1table$position == 9]
# [1] 0.90 0.45
# Get all Letters of interest with
subset(files1table,position==9)
# position percentageA percentageC percentageG percentageT name
# 9 9 0.90 0.90 0.90 0.90 item1
# 19 9 0.45 0.45 0.45 0.45 item2将数据帧列表的所有列表展平为单个数据帧
# Now create anoter list, files2, duplicate just for the sake of the example
files2 <- files1
# file1 and file2 both have a name column inside their dataframes already
# Create a list of list of dataframes
lolod <- list(files1 = files1, files2 = files2)
str(lolod) # a list of lists
# Flatten to a list of dataframes
# Use sapply to keep names based on this answer http://stackoverflow.com/a/9469981/2641825
lod <- sapply(lolod, Reduce, f=rbind, simplify = FALSE, USE.NAMES = TRUE)
# Add the name inside each data frame again
addfilename <- function(i, listoffiles){
dtf <- listoffiles[[i]] # Extract the dataframe from the list
dtf$filename <- names(listoffiles[i]) # Add the name inside the data frame
return(dtf)
}
lod <- lapply(seq_along(lod), addfilename, lod)
# Flatten to a dataframe
d <- Reduce(rbind, lod)
# Now the data structure is flattened and much easier to deal with
subset(d,position==9)
# position percentageA percentageC percentageG percentageT name filename
# 9 9 0.90 0.90 0.90 0.90 item1 files1
# 19 9 0.45 0.45 0.45 0.45 item2 files1
# 30 9 0.90 0.90 0.90 0.90 item1 files2
# 40 9 0.45 0.45 0.45 0.45 item2 files2这个答案比我预期的要长得多。我希望我没有吓到你。受tidy data的启发,简化数据结构将为您以后的工作提供便利。如果您在原始数据中提供了名称,那么这种复杂的列表重命名操作可能就没有必要了。
https://stackoverflow.com/questions/41378161
复制相似问题