我有这种txt格式的原始数据:
Name|Occupation|Comment
Robert|Doctor|To process, please provide:
a. Tax Returns
b. Identification
c. Statement of Approval
Sally|Accountant|Approved在这里,|是分隔符。
对于罗伯特,我想“处理,请提供: a.纳税申报表b.标识c.批准声明”作为一个字符串反映在Comment下。
但是,使用read.csv导入具有以下参数:
read.csv(
"data/text_data",
fileEncoding = "UTF-8",
sep = "|",
na.strings = "",
quote = ""
)给出了更多的行
Name Occupation Comment
Robert Doctor To process, please provide:
a. Tax Returns NA NA
b. Identification NA NA
c. Statement of Approval NA NA
Sally Accountant Approved是否有任何R导入函数或争论技巧来解决这个问题?Tidyverse解决方案是非常可取的,谢谢。
发布于 2022-01-24 08:19:46
使用"id"在临时by列上拆分数据、应用转换和重新组合是一个简单的选项。为了方便起见,您可以将其封装在一个函数中。
myDataReader <- \(link) {
r <- read.csv(link, fileEncoding="UTF-8", sep="|", na.strings="", quote="")
r$id <- cumsum(!is.na(r$Occupation))
do.call(what=rbind, by(r, r$id, \(x) {
cbind(x[1, 1:2], Comment=trimws(paste(x[1, 3], toString(x[-1, 1]))))
}))
}
myDataReader('data/text_data')
# Name Occupation Comment
# 1 Robert Doctor To process, please provide: a. Tax Returns, b. Identification, c. Statement of Approval
# 2 Sally Accountant Approved
# 3 Tom Lawyer To process, please provide: a. Tax Returns, b. Identification
# 4 Sally Accountant Approved注: R>= 4.1已使用。
'data/text_data'__的内容:
Name|Occupation|Comment
Robert|Doctor|To process, please provide:
a. Tax Returns
b. Identification
c. Statement of Approval
Sally|Accountant|Approved
Tom|Lawyer|To process, please provide:
a. Tax Returns
b. Identification
Sally|Accountant|Approvedhttps://stackoverflow.com/questions/70830488
复制相似问题