我注意到当读取一个大的csv文件时
output <- read.table( ..., header = TRUE, sep = ",")创建的数据框架有一些空列。这些列遵循命名模式。
colnames(output)
"Factor.1" "Factor.2" "etc" "Stuff" "X" "X.1" "X.2" "X.3" "X.4" "X.5"
"X.6" "X.7" "X.8" "X.9" "X.10" "X.11" "X.12" "X.13"
"X.14" "X.15" "X.16" "X.17" "X.18" "X.19" "X.20" "X.21"
"X.22" "X.23" "X.24" "X.25" "X.26" "X.27" "X.28" "X.29"
"X.30" "X.31" "X.32" "X.33"我注意到在?read.table中它声明
col.names:变量的可选名称向量。默认情况下使用"V“,后面跟着列号。
为什么它用X代替V?
编辑:这就是csv文件的样子
Date,Duration,Count,Factor 1,Factor 2,Factor 3,Hour,Day,Month,Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 0:00,9.99,10,GC,LS,FT,0,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 1:00,9.63125,8,GC,LS,FT,1,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 2:00,7.388888889,3,GC,LS,FT,2,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 3:00,7.087037037,9,GC,LS,FT,3,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..。
发布于 2015-02-20 02:08:37
下面是来自read.table()的相关代码片段
if (header) {
.External(C_readtablehead, file, 1L, comment.char,
blank.lines.skip, quote, sep, skipNul)
if (missing(col.names))
col.names <- first
else if (length(first) != length(col.names))
warning("header and 'col.names' are of different lengths")
}重要的是if (missing(col.names)) col.names <- first。从那里,我们可以返回并得到first,在这种情况下定义为
first <- scan(textConnection(file), what = "", sep = ",",
nlines = 1, quiet = TRUE, skip = 0, strip.white = TRUE)这会导致
# [1] "Date" "Duration" "Count" "Factor 1" "Factor 2" "Factor 3" "Hour" "Day" "Month"
# [10] "Year" "" "" "" "" "" "" "" ""
# [19] "" "" "" "" "" "" "" "" ""
# [28] "" "" "" "" "" "" "" "" ""
# [37] "" "" "" "" "" "" "" "" 然后,make.names()在col.names上被调用,导致您的名字
make.names(first, unique = TRUE)
# [1] "Date" "Duration" "Count" "Factor.1" "Factor.2" "Factor.3" "Hour" "Day" "Month"
# [10] "Year" "X" "X.1" "X.2" "X.3" "X.4" "X.5" "X.6" "X.7"
# [19] "X.8" "X.9" "X.10" "X.11" "X.12" "X.13" "X.14" "X.15" "X.16"
# [28] "X.17" "X.18" "X.19" "X.20" "X.21" "X.22" "X.23" "X.24" "X.25"
# [37] "X.26" "X.27" "X.28" "X.29" "X.30" "X.31" "X.32" "X.33" 我们获得X而不是V的原因正如文档中所指出的,是因为在if(header)之后的下一个条件是
else if (missing(col.names))
col.names <- paste0("V", 1L:cols) 但是,我们从来没有做到这一点,make.names()默认情况下连接到X。不仅仅是这个解释。最好的方法是遍历read.table源代码(它很复杂)。
数据:
file <- "Date,Duration,Count,Factor 1,Factor 2,Factor 3,Hour,Day,Month,Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 0:00,9.99,10,GC,LS,FT,0,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 1:00,9.63125,8,GC,LS,FT,1,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 2:00,7.388888889,3,GC,LS,FT,2,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 3:00,7.087037037,9,GC,LS,FT,3,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"https://stackoverflow.com/questions/28619973
复制相似问题