首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何删除R中包含任何非字母字符(连字符和撇号除外)的单词

如何删除R中包含任何非字母字符(连字符和撇号除外)的单词
EN

Stack Overflow用户
提问于 2019-01-27 11:08:23
回答 2查看 250关注 0票数 0

谢谢。

例如:

str = "he@llo wor*ld i'm using state-of-the-art technologies it's i4u"

预期输出" i'm using state-of-the-art technologies it's "

我已经尝试了以下正则表达式。

代码语言:javascript
复制
lines <- c("i'm",
           'gas-lighting',
           "i'm gas-lighting",
           "i-love-you",
           "i@u",
           "b2b",
           "i'm gas-lighting u i@u b2b")
代码语言:javascript
复制
gsub("\\w+[^a-z'-]+\\w+", " ", lines) 
[1] "i'm"          "gas-lighting" "i' -lighting" "i-love-you"   " "            
" "            "i' -     "

问题是单词之间的间距?尝试跳过空格。

代码语言:javascript
复制
gsub("\\w+[^a-z\\s'-]+\\w+", " ", lines)**  
[1] "i'm"          "gas-lighting" "i' -lighting" "i-love-you"   " "            
" "            "i' -     "

它不会跳过空格?应为以下字符串。

代码语言:javascript
复制
[1] "i'm"          "gas-lighting" "i'm gas-lighting" "i-love-you"   " "            
" "            "i'm gas-lighting u    "

更新2:好的,到目前为止一切正常。

代码语言:javascript
复制
> lines <- c("i'm",
+            'gas-lighting',
+            "i'm gas-lighting",
+            "i-love-you",
+            "i@u",
+            "b2b",
+            "i'm gas-lighting u and you and you i@u b2b",
+            " he@llo wor$ld how*are&you ")
>
> # split a string at spaces then remove the words 
> # that contain any non-alphabetic characters (excpet "-", "'")
> # then paste them together (separate them with spaces)
> unlist(lapply(lines, function(line){
+   words <- unlist(strsplit(line, "\\s+"))
+   words <- words[!grepl("[^a-z'-]", words, perl=TRUE)]
+   paste(words, collapse=" ")}))
[1] "i'm"                                "gas-lighting"                      
[3] "i'm gas-lighting"                   "i-love-you"                        
[5] ""                                   ""                                  
[7] "i'm gas-lighting u and you and you" "" 

更新1:到目前为止,我使用的是以下正则表达式。

代码语言:javascript
复制
> # replace word at the beginning of a string
> lines <- gsub("^\\s*\\w*[^a-z'-]+\\w*", " ", lines); lines
[1] "i'm"                     "gas-lighting"            "i'm gas-lighting"        "i-love-you"             
[5] " "                       " "                       "i'm gas-lighting u i@u "
> # replace word at the end of a string
> lines <- gsub("\\s[a-z]+[^a-z'-]+\\w*$", " ", lines); lines 
[1] "i'm"                     "gas-lighting"            "i'm gas-lighting"        "i-love-you"             
[5] " "                       " "                       "i'm gas-lighting u i@u "
> # replace words between spaces
> gsub("\\s\\w*[^a-z'-]+\\w*\\s", " ", lines)
[1] "i'm"                 "gas-lighting"        "i'm gas-lighting"    "i-love-you"          " "                  
[6] " "                   "i'm gas-lighting u "
EN

回答 2

Stack Overflow用户

发布于 2019-01-27 11:29:17

我想出了一个间接的方法,但它起作用了。

代码语言:javascript
复制
library(tidyverse)

str = "he@llo wor*ld i'm using state-of-the-art technologies it's i4u"

##Break the string based on spaces
break_1 <- (str_split(str, pattern = "\\s"))

##Find the good words and put them in a vector
good_words <- unlist(break_1)[!sapply(break_1,
                                      function(i)str_detect(i,pattern = "[^(Aa-zZ|\\-|')]"))]

##Merge the vector
merged_vector <- paste0(good_words, collapse = " ")
merged_vector
票数 0
EN

Stack Overflow用户

发布于 2019-01-27 14:15:59

作为Harro Cyranka with grepl的变体

代码语言:javascript
复制
paste0(sapply(break_1, function(x) x[!grepl("[^Aa-zZ|'|-]", x)]), collapse = " ")
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/54384744

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档