首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >read.table自动列名

read.table自动列名
EN

Stack Overflow用户
提问于 2015-02-20 01:02:15
回答 1查看 2.8K关注 0票数 2

我注意到当读取一个大的csv文件时

代码语言:javascript
复制
output <- read.table( ..., header = TRUE, sep = ",")

创建的数据框架有一些空列。这些列遵循命名模式。

代码语言:javascript
复制
 colnames(output)
     "Factor.1"   "Factor.2"   "etc"        "Stuff"      "X"          "X.1"        "X.2"        "X.3"        "X.4"        "X.5"       
     "X.6"        "X.7"        "X.8"        "X.9"        "X.10"       "X.11"       "X.12"       "X.13"      
     "X.14"       "X.15"       "X.16"       "X.17"       "X.18"       "X.19"       "X.20"       "X.21"      
     "X.22"       "X.23"       "X.24"       "X.25"       "X.26"       "X.27"       "X.28"       "X.29"      
     "X.30"       "X.31"       "X.32"       "X.33"

我注意到在?read.table中它声明

col.names:变量的可选名称向量。默认情况下使用"V“,后面跟着列号。

为什么它用X代替V?

编辑:这就是csv文件的样子

代码语言:javascript
复制
Date,Duration,Count,Factor 1,Factor 2,Factor 3,Hour,Day,Month,Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 0:00,9.99,10,GC,LS,FT,0,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 1:00,9.63125,8,GC,LS,FT,1,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 2:00,7.388888889,3,GC,LS,FT,2,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 3:00,7.087037037,9,GC,LS,FT,3,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

..。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-02-20 02:08:37

下面是来自read.table()的相关代码片段

代码语言:javascript
复制
if (header) {
    .External(C_readtablehead, file, 1L, comment.char, 
              blank.lines.skip, quote, sep, skipNul)
    if (missing(col.names)) 
        col.names <- first
    else if (length(first) != length(col.names)) 
        warning("header and 'col.names' are of different lengths")
}

重要的是if (missing(col.names)) col.names <- first。从那里,我们可以返回并得到first,在这种情况下定义为

代码语言:javascript
复制
first <- scan(textConnection(file), what = "", sep = ",", 
    nlines = 1, quiet = TRUE, skip = 0, strip.white = TRUE)

这会导致

代码语言:javascript
复制
#  [1] "Date"     "Duration" "Count"    "Factor 1" "Factor 2" "Factor 3" "Hour"     "Day"      "Month"   
# [10] "Year"     ""         ""         ""         ""         ""         ""         ""         ""        
# [19] ""         ""         ""         ""         ""         ""         ""         ""         ""        
# [28] ""         ""         ""         ""         ""         ""         ""         ""         ""        
# [37] ""         ""         ""         ""         ""         ""         ""         ""        

然后,make.names()col.names上被调用,导致您的名字

代码语言:javascript
复制
make.names(first, unique = TRUE)
#  [1] "Date"     "Duration" "Count"    "Factor.1" "Factor.2" "Factor.3" "Hour"     "Day"      "Month"   
# [10] "Year"     "X"        "X.1"      "X.2"      "X.3"      "X.4"      "X.5"      "X.6"      "X.7"     
# [19] "X.8"      "X.9"      "X.10"     "X.11"     "X.12"     "X.13"     "X.14"     "X.15"     "X.16"    
# [28] "X.17"     "X.18"     "X.19"     "X.20"     "X.21"     "X.22"     "X.23"     "X.24"     "X.25"    
# [37] "X.26"     "X.27"     "X.28"     "X.29"     "X.30"     "X.31"     "X.32"     "X.33"    

我们获得X而不是V的原因正如文档中所指出的,是因为在if(header)之后的下一个条件是

代码语言:javascript
复制
else if (missing(col.names)) 
    col.names <- paste0("V", 1L:cols) 

但是,我们从来没有做到这一点,make.names()默认情况下连接到X。不仅仅是这个解释。最好的方法是遍历read.table源代码(它很复杂)。

数据:

代码语言:javascript
复制
file <- "Date,Duration,Count,Factor 1,Factor 2,Factor 3,Hour,Day,Month,Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 0:00,9.99,10,GC,LS,FT,0,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 1:00,9.63125,8,GC,LS,FT,1,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 2:00,7.388888889,3,GC,LS,FT,2,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 3:00,7.087037037,9,GC,LS,FT,3,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"
票数 5
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/28619973

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档