我有一个文本文件,其中包含超过100,000行,我每周从SAP下载。它作为页面下载,每个页面都包含相同的标题和虚线。下面是一个包含两个页面的最小示例,每个页面只包含两个项。
------------------------------------------------------------
|date |Material |Description |
|----------------------------------------------------------|
|10/04/2013 |WM.5597394 |PNEUMATIC |
|11/07/2013 |GB.D040790 |RING |
------------------------------------------------------------
------------------------------------------------------------
|date |Material |Description |
|----------------------------------------------------------|
|08/06/2013 |WM.4M01004A05 |TOUCHEUR |
|08/06/2013 |WM.4M010108-1 |LEVER |
------------------------------------------------------------我想做的是把这个文件导入R中,只有一个标题,没有破折号。我试过:
read.table( "myfile.txt", sep = "|", fill=TRUE)非常感谢
发布于 2014-01-14 13:54:55
另一种readLines方法是:
l <- readLines("myfile.txt")
# remove unnecessary lines
l <- grep("^\\|?-+\\|?$|^$", l, value = TRUE, invert = TRUE)
# remove duplicated headers
l2 <- c(l[1], l[-1][l[-1] != l[1]])
# split
lsplit <- strsplit(l2, "\\s*\\|")
# create data frame
dat <- setNames(data.frame(do.call(rbind, lsplit[-1])[ , -1]), lsplit[[1]][-1])
date Material Description
1 10/04/2013 WM.5597394 PNEUMATIC
2 11/07/2013 GB.D040790 RING
3 08/06/2013 WM.4M01004A05 TOUCHEUR
4 08/06/2013 WM.4M010108-1 LEVER发布于 2014-01-14 13:38:47
您可以像文本一样预处理文件,然后使用read.table。
lines <- readLines("myfile.txt")
lines <- sapply(lines, gsub, pattern="[-]{2,}|[|]", replacement="")
lines <- c(lines[2], lines[lines!="" & lines!=lines[2]])
read.table(text=lines, header=T)给出
date Material Description
1 10/04/2013 WM.5597394 PNEUMATIC
2 11/07/2013 GB.D040790 RING
3 08/06/2013 WM.4M01004A05 TOUCHEUR
4 08/06/2013 WM.4M010108-1 LEVER发布于 2014-01-14 13:38:44
您可以使用readLines和read.table (可能不是很有效):
ll <- readLines(textConnection(txt))
dat <- read.table(text=ll[!grepl('--',ll)],sep='|',header=TRUE)[,-c(1,5)]
dat[!grepl('date',dat$date),]
date Material Description
1 10/04/2013 WM.5597394 PNEUMATIC
2 11/07/2013 GB.D040790 RING
4 08/06/2013 WM.4M01004A05 TOUCHEUR
5 08/06/2013 WM.4M010108-1 LEVER https://stackoverflow.com/questions/21114598
复制相似问题