我知道这绝对是一团糟,我很抱歉。在R中,我希望向一组大型txt文件中添加一个列,然后使用新列创建新的txt文件。我没有收到错误信息,但是运行需要很长时间,最终会被杀死。是否有更好的方法来运行下面的程序,或者仍然存在一个bug?我对R还不熟悉,所以仍然不知道它的正常行为。
files <- dir(".../rsid", pattern= glob2rx("final_rsid_chrom*.txt"))
rsid.db <- list()
for (i in files){
rsid.db[[i]] <- lapply(i, read.table, header = T, sep=" ", stringsAsFactors = F)
rsid.db[[i]]$linkage.new <- apply(rsid_db[[i]], 1, create.alleles.col, "REF", "ALT")
write.table(rsid.db[[i]], "rsid_linkage_new[[i]].txt", quote = FALSE,row.names=FALSE)
}罗兰-我也分别运行了以下内容,并且有同样的问题(也就是花了很长时间)。
files <- dir("/home/sjk98/rds/rds-jmmh2-projects/iron/novel/novel_iron/cleaned/3_columns_std/rsid", pattern= glob2rx("final_rsid_chrom*.txt"))
rsid.db <- list()
rsid.db <- lapply(files, read.table, header = T, sep=" ", stringsAsFactors = F)
quit()发布于 2021-07-16 14:12:46
我认为在下面的代码中有一些bug是修复的。但是,如果create.alleles.col是一个耗时的部分,那么如果不了解这个函数,我们就忍不住了。
# someone correct me if I'm wrong, but I don't believe ".../" is a legit path;
# "../risd" looks for risd in the parent dir of the current working directory.
# I think you want a dot before the * in your regext as the original m* matches
# zero or more m characters
files <- dir("../rsid", pattern = "final_rsid_chrom.*\\.txt", full.names = TRUE)
for (i in files){
# no need to call lapply on a single item; using a temporary variable
# instead of a list will release the memory from each file instead of
# building up in RAM if saved to a list
rsid.tmp <- read.table(i, header = TRUE, sep=" ", stringsAsFactors = FALSE)
# if create.alleles.col is the time consuming part, we can't help without
# knowing more about it
rsid.tmp$linkage.new <- apply(rsid.tmp, 1, create.alleles.col, "REF", "ALT")
# create a new file name with _update appended to the end, i.e., oldname_update.txt
new_file <- sub("(\\.txt)$", "_updated\\1", i)
write.table(rsid.tmp, new_file, quote = FALSE, row.names = FALSE)
}https://stackoverflow.com/questions/68409041
复制相似问题