我有一个DNA测序信息的dataframe aSNPst.df,有些点在测序过程中没有被读取。
|Vol|Sex|Ethnicity|SNP1|SNP2|SNP3|SNP4|SNP5|
|001| M | European| AA | GC | | TT | GG |
|002| M | European| AA | CC | | TT | GG |
|003| F | Mixed | AT | GC | | AT | GG |
|004| F | European| AA | GC | | TT | GG |
|005| M | European| TT | GG | | AT | GG |我认为这些空白列意味着我的代码无法识别分隔符sep=""
如何从数据帧中删除列,并记录正在删除的SNP?
发布于 2021-02-07 10:58:01
如果数据中有空值(""),则可以使用:
result <- df[colSums(df != '') > 0]如果您所说的空值实际上是指NA,则执行以下操作:
result <- df[colSums(!is.na(df)) > 0]以获取已删除的列名。
removed_columns <- setdiff(names(df), names(result))发布于 2021-02-07 10:56:00
allna <- sapply(dat, function(z) all(is.na(z)))
allna
# Vol Sex Ethnicity SNP1 SNP2 SNP3 SNP4 SNP5
# FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
which(allna)
# SNP3
# 6
names(which(allna))
# [1] "SNP3"
### remove SNP3 from the original data
dat <- dat[,!allna]如果您想查找所有的NA或空字符串(""),那么只需调整内部函数:
allna <- sapply(dat, function(z) all(is.na(z) | !nzchar(z)))数据
structure(list(Vol = 1:5, Sex = c("M", "M", "F", "F", "M"), Ethnicity = c("European", "European", "Mixed", "European", "European"), SNP1 = c("AA", "AA", "AT", "AA", "TT"), SNP2 = c("GC", "CC", "GC", "GC", "GG"), SNP4 = c("TT", "TT", "AT", "TT", "AT"), SNP5 = c("GG", "GG", "GG", "GG", "GG")), class = "data.frame", row.names = c(NA, -5L))https://stackoverflow.com/questions/66084099
复制相似问题