我有一个包含多条tweet的专栏:
ID | Tweet
1 @ChipotleTweets @ChipotleTweets Becky is very nice
2 Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets
3 Considering walking to @.ChipotleTweets in my llama onesie.这样做的目的是删除“@___”和@之后的所有内容,但不删除该字符串之外的空格文本。
目前正在使用这段代码来检测'@‘,但是如果它不在句子的第一个位置,我就不会发现任何东西
tweet_pattern <- " @\\w+"
Customer <- Customer %>%
clean_Tweet = ifelse(str_detect(text, tweet_pattern),
str_remove(text, tweet_pattern),
NA_character_))所需输出:
ID | Tweet | cleaned_tweet
1 @ChipotleTweets @ChipotleTweets Becky is very nice Becky is very nice
2 Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets Happy Halloween! I now look forward to $3 booritos at
3 Considering walking to @.ChipotleTweets in my llama onesie. Considering walking to in my llama onesie.发布于 2021-09-14 01:57:37
我们可以更改模式以匹配零个或多个空格(\\s*),后跟@和str_remove_all中的一个或多个非空格(\\S+),以删除这些子字符串
library(stringr)
library(dplyr)
Customer %>%
mutate(Cleaned_Tweet = str_remove_all(Tweet, "\\s*@\\S+"))-output
ID Tweet Cleaned_Tweet
1 1 @ChipotleTweets @ChipotleTweets Becky is very nice Becky is very nice
2 2 Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets Happy Halloween! I now look forward to $3 booritos at
3 3 Considering walking to @.ChipotleTweets in my llama onesie. Considering walking to in my llama onesie.注意:str_remove只删除了match的第一个实例,也就是说,如果一个字符串中有多个匹配,它会跳过其他的,只匹配第一个。我们需要str_remove_all来删除匹配模式的所有实例。
数据
Customer <- structure(list(ID = 1:3, Tweet = c("@ChipotleTweets @ChipotleTweets Becky is very nice",
"Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets",
"Considering walking to @.ChipotleTweets in my llama onesie."
)), class = "data.frame", row.names = c(NA, -3L))https://stackoverflow.com/questions/69170914
复制相似问题