首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R中的文本清洗

R中的文本清洗
EN

Stack Overflow用户
提问于 2015-03-20 18:42:52
回答 2查看 368关注 0票数 0

我在R中有一个列,如下所示:

代码语言:javascript
复制
Path Column
ag.1.4->ao.5.5->iv.9.12->ag.4.35
ao.11.234->iv.345.455.1.2->ag.9.531

我想把它转化为:

代码语言:javascript
复制
Path Column
ag->ao->iv->ag
ao->iv->ag

我该怎么做?

谢谢

以下是我从我的数据中获得的全部数据:

代码语言:javascript
复制
structure(list(Rank = c(10394749L, 36749879L), Count = c(1L, 
1L), Percent = c(0.001011122, 0.001011122), Path = c("ao.legacy payment.not_completed->ao.legacy payment.not_completed->ao.legacy payment.completed", 
"ao.legacy payment.not_completed->agent.payment.completed")), .Names = c("Rank", 
"Count", "Percent", "Path"), class = "data.frame", row.names = c(NA, 
-2L))
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-03-20 18:44:43

您可以使用gsub来匹配.. (\\.[0-9]+)后面的数字,并将其替换为''

代码语言:javascript
复制
 df1$Path.Column <- gsub('\\.[0-9]+', '', df1$Path.Column)
 df1
 #           Path.Column
 #1 ag -> ao -> iv -> ag
 #2       ao -> iv -> ag

更新

对于新的数据集df2

代码语言:javascript
复制
gsub('\\.[^->]+(?=(->|\\b))', '', df2$Path, perl=TRUE)
#[1] "ao->ao->ao" "ao->agent" 

在OP的文章中所显示的字符串

代码语言:javascript
复制
str2 <- c('ag.1.4->ao.5.5->iv.9.12->ag.4.35',
    'ao.11.234->iv.345.455.1.2->ag.9.531')

gsub('\\.[^->]+(?=(->|\\b))', '', str2, perl=TRUE)
 #[1] "ag->ao->iv->ag" "ao->iv->ag"    

数据

代码语言:javascript
复制
df1 <- structure(list(Path.Column = c("ag.1 -> ao.5 -> iv.9 -> ag.4", 
"ao.11 -> iv.345 -> ag.9")), .Names = "Path.Column", 
class = "data.frame", row.names = c(NA, -2L))

df2  <- structure(list(Rank = c(10394749L, 36749879L), Count = c(1L, 
1L), Percent = c(0.001011122, 0.001011122), 
Path = c("ao.legacy payment.not_completed->ao.legacy payment.not_completed->ao.legacy payment.completed", 
"ao.legacy payment.not_completed->agent.payment.completed")), 
.Names = c("Rank", "Count", "Percent", "Path"), class = "data.frame", 
row.names = c(NA, -2L))
票数 2
EN

Stack Overflow用户

发布于 2015-03-20 19:21:56

'->'上拆分字符串并分别处理子字符串是很容易的

代码语言:javascript
复制
 # split the stirngs into parts
 subStrings <- strsplit(df$Path,'->')
 # remove eveything after **first** the dot
 subStrings<- lapply(subStrings,
                     function(x)gsub('\\..*','',x))
 # paste them back together.
 sapply(subStrings,paste0,collapse="->")
 #> "ao->ao->ao" "ao->agent" 

代码语言:javascript
复制
 # split the stirngs into parts
 subStrings <- strsplit(df$Path,'->')
 # remove the parts of the identifiers after the dot
 subStrings<- lapply(subStrings,
                     function(x)gsub('\\.[^ \t]*','',x))
 # paste them back together.
 sapply(subStrings,paste0,collapse="->")
 #> "ao payment->ao payment->ao payment" "ao payment->agent"   
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/29173702

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档