首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R: read.fwf将整数定义为数字

R: read.fwf将整数定义为数字
EN

Stack Overflow用户
提问于 2021-05-30 17:43:26
回答 2查看 183关注 0票数 0

我有一个.txt文件,并且正在使用Rstudio。

代码语言:javascript
复制
200416657210340 1665721 20040608 20090930 20060910 20070910 20080827 20090804
200416657210345 1665721 20040907 20090203 20070331 20080719                  
200416657210347 1665721 20040914 20091026 20070213 20080114 20090302         
200416657210352 1665721 20041111 20100315 20070123 20071205          20081202

我正在尝试使用.txt读取read.fwf文件:

代码语言:javascript
复制
gripalisti <- read.fwf(file = "gripalisti.txt",
                         widths = c(15,8,9,9,9,9,9,9),
                         header = FALSE,
                         #stringsAsFactors = FALSE, 
                       col.names = c("einst","bu","faeding","forgun","burdur1",
                                     "burdur2","burdur3","burdur4"))

这工作和列是正确的长度。然而,"einst“和"bu”应该是整数值,其余的应该是日期。

当导入第一列(ID变量)中的所有值时,如下所示:

代码语言:javascript
复制
2.003140e+14

我一直在寻找一种将导入的列更改为整数(或字符?)的方法。值,我没有发现任何不会导致错误的东西。举个例子,我在google之后尝试过:

代码语言:javascript
复制
gripalisti <- read.fwf(file = "gripalisti.txt",
                         widths = c(15,8,9,9,9,9,9,9),
                         header = FALSE,
                         #stringsAsFactors = FALSE, 
                       col.names = c("einst","bu","faeding","forgun","burdur1",
                                     "burdur2","burdur3","burdur4"),
                       colclasses = c("integer", "integer", "Date", "Date",
                                      "Date", "Date", "Date", "Date"))

结果出现错误:

代码语言:javascript
复制
Error in read.table(file = FILE, header = header, sep = sep, row.names = row.names,  : 
  unused argument (colclasses = c("integer", "integer", "Date", "Date", "Date", "Date", "Date", "Date"))

dataset中有许多缺失值,超过100.000行。所以其他的进口方式对我来说没有用。数据集不是以制表符分隔的。

抱歉,如果这是显而易见的,我是一个非常新的R用户。

编辑:

谢谢你的帮助,我把它改成:

代码语言:javascript
复制
 colClasses = c("character", 

现在看上去不错。

EN

回答 2

Stack Overflow用户

发布于 2021-05-30 18:37:15

正如评论中所建议的:

第一个字段不能作为"character";

  • (additionally)存储为"integer",它必须是"numeric"

,如果日期不是%Y-%m-%d的默认格式,则需要在读取数据后转换它们。H 212G 213

准备:

代码语言:javascript
复制
writeLines("200416657210340 1665721 20040608 20090930 20060910 20070910 20080827 20090804\n200416657210345 1665721 20040907 20090203 20070331 20080719                  \n200416657210347 1665721 20040914 20091026 20070213 20080114 20090302         \n200416657210352 1665721 20041111 20100315 20070123 20071205          20081202",
           con = "gripalisti.txt")

处决:

代码语言:javascript
复制
dat <- read.fwf("gripalisti.txt", widths = c(15,8,9,9,9,9,9,9), header = FALSE,
                col.names = c("einst","bu","faeding","forgun","burdur1", "burdur2","burdur3","burdur4"),
                colClasses = c("character", "integer", "character", "character", "character", "character", "character", "character"))
str(dat)
# 'data.frame': 4 obs. of  8 variables:
#  $ einst  : chr  "200416657210340" "200416657210345" "200416657210347" "200416657210352"
#  $ bu     : int  1665721 1665721 1665721 1665721
#  $ faeding: chr  " 20040608" " 20040907" " 20040914" " 20041111"
#  $ forgun : chr  " 20090930" " 20090203" " 20091026" " 20100315"
#  $ burdur1: chr  " 20060910" " 20070331" " 20070213" " 20070123"
#  $ burdur2: chr  " 20070910" " 20080719" " 20080114" " 20071205"
#  $ burdur3: chr  " 20080827" "         " " 20090302" "         "
#  $ burdur4: chr  " 20090804" "         " "         " " 20081202"

dat[,3:8] <- lapply(dat[,3:8], as.Date, format = "%Y%m%d")
dat
#             einst      bu    faeding     forgun    burdur1    burdur2    burdur3    burdur4
# 1 200416657210340 1665721 2004-06-08 2009-09-30 2006-09-10 2007-09-10 2008-08-27 2009-08-04
# 2 200416657210345 1665721 2004-09-07 2009-02-03 2007-03-31 2008-07-19       <NA>       <NA>
# 3 200416657210347 1665721 2004-09-14 2009-10-26 2007-02-13 2008-01-14 2009-03-02       <NA>
# 4 200416657210352 1665721 2004-11-11 2010-03-15 2007-01-23 2007-12-05       <NA> 2008-12-02

str(dat)
# 'data.frame': 4 obs. of  8 variables:
#  $ einst  : chr  "200416657210340" "200416657210345" "200416657210347" "200416657210352"
#  $ bu     : int  1665721 1665721 1665721 1665721
#  $ faeding: Date, format: "2004-06-08" "2004-09-07" "2004-09-14" "2004-11-11"
#  $ forgun : Date, format: "2009-09-30" "2009-02-03" "2009-10-26" "2010-03-15"
#  $ burdur1: Date, format: "2006-09-10" "2007-03-31" "2007-02-13" "2007-01-23"
#  $ burdur2: Date, format: "2007-09-10" "2008-07-19" "2008-01-14" "2007-12-05"
#  $ burdur3: Date, format: "2008-08-27" NA "2009-03-02" NA
#  $ burdur4: Date, format: "2009-08-04" NA NA "2008-12-02"
票数 1
EN

Stack Overflow用户

发布于 2021-05-30 17:56:07

在这里,第一列中的数字是非常大的数字,如果以整数或数字的形式导入它,它将自动以指数格式显示。解决这个问题的方法是在读取文件之前设置“枕”。使用以下代码:

选项(scipen= 999)

我觉得这应该能解决你的问题。

下面是我运行的代码,当然是您需要使用的日期列的代码。为此,您可以使用像as.Date这样的简单命令(gripalisti$burdur1,format = "%Y%m%d")

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67764139

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档