我使用的是read.delim函数,但是由于我正在阅读的文本行也包含用户使用逗号(",")的注释,所以注释被划分为两个或多个列。
以下是数据集的两行内容:
@Zillaman u只是在Zina婴儿床上吃了各种各样的食物,却没有想到我! 当皮恩先生开始工作时,我肯定才11岁。这是全家人的必经之路,我相信太阳.
第一行将被正确读取。"0“将在下一栏中读取。第二行被分成三列,最后一列包含"1“。
dataset_original = read.delim('TrainingData.csv',
quote = "",
row.names = NULL,
stringsAsFactors = FALSE,
header = F, as.is = F,
colClasses = "character",
blank.lines.skip = T,
sep = ",")发布于 2018-12-29 15:27:48
试着单独阅读所有的行,然后再将文本和目标列分开。
试试这个:
df= read.delim('TrainingData.csv',
quote = "",
row.names = NULL,
stringsAsFactors = FALSE,
header = F, as.is = F,
colClasses = "character",
blank.lines.skip = T,
sep = "\n")
df$target = regmatches(df$V1, regexpr(pattern = "[^,]*$", text = df$V1))
df$V1 = sub(pattern = ",[^,]*$", replacement = "", x = df$V1)df的意思是dataset_original
示例:
文件中包含:
hello,0
world,1
not,right,1
this,one,is,even,worse,0此方法返回:
> df
V1 target
1 hello 0
2 world 1
3 not,right 1
4 this,one,is,even,worse 0发布于 2018-12-29 18:50:12
如果我们使用readLines()读取文件,则可以在最后一个逗号上拆分。
write(x="@Zillaman u just aite all types of food at Zina crib and didnt even think about me!!!!,0
I must have been only 11 when Mr Peepers started. It was a must see for the whole family, I believe on Sun...,1",
file="file.txt")
gg <- readLines("file.txt")
spl <- strsplit(gg, ",(?=[^,]+$)", perl=TRUE)
dtf <- as.data.frame(do.call(rbind, spl), stringsAsFactors=FALSE)
dtf
# V1 V2
# 1 @Zillaman u just (...) didnt even think about me!!!! 0
# 2 I must have been (...) family, I believe on Sun... 1https://stackoverflow.com/questions/53970548
复制相似问题