文章/答案/技术大牛

发布

社区首页 >问答首页 >从R中的字符串中提取电话号码

问从R中的字符串中提取电话号码
EN

Stack Overflow用户

提问于 2014-05-06 14:56:10

回答 2查看 4.1K关注 0票数 1

我正在使用R中的'stringr'包从文本中提取内容，我找到了以下示例：

strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569",
"387 287 6718", "apple", "233.398.9187 ", "482 952 3315",
"239 923 8115", "842 566 4692", "Work: 579-499-7527", "$1000",
"Home: 543.355.3679")
pattern <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
str_extract(strings, pattern)
str_extract_all(strings, pattern)

但是，我的字符串的格式如下：

strings <- c("87225324","65-62983211","65-6298-3211","8722 5324","(65) 6296-2995","(65) 6660 8060","(65) 64368308","+65 9022 7744","+65 6296-2995","+65-6427 8436","+65 6357 3323/322")

但我不确定pattern提取上述所有format.Any帮助将是很好的。

text-extraction

stringr

regex

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-05-06 17:22:34

下面的代码涵盖了你问题中的案例。希望，如果您在数据中找到其他字符组合，则可以泛化它。

# Phone numbers (I've added an additional number with the "/" character)
strings <- c("87225324","65-62983211","65-6298-3211","8722 5324",
           "(65) 6296-2995","(65) 6660 8060","(65) 64368308","+65 9022 7744",
           "+65 6296-2995","+65-6427 8436","+65 6357 3323/322", "+65 4382 6922/6921")

# Remove all non-numeric characters except "/" (your string doesn't include any
# text like "Work:" or "Home:", but I included a regex to deal with those cases
# as well)
strings.cleaned = gsub("[- .)(+]|[a-zA-Z]*:?","", strings)

# If you're sure there are no other non-numeric characters you need to deal with 
# separately, then you can also do the following instead of the code above: 
# gsub("[^0-9/]","", strings). This regex matches any character that's not 
# a digit or "/".

strings.cleaned
 [1] "87225324"       "6562983211"     "6562983211"     "87225324"       "6562962995"    
 [6] "6566608060"     "6564368308"     "6590227744"     "6562962995"     "6564278436"    
[11] "6563573323/322" "6543826922/6921"

# Separate string vector into the cleaned strings and the two "special cases" that we 
# need to deal with separately
special.cases = strings.cleaned[grep("/", strings.cleaned)]
strings.cleaned = strings.cleaned[-grep("/", strings.cleaned)]

# Split each phone number with a "/" into two phone numbers
special.cases = unlist(lapply(strsplit(special.cases, "/"), 
                          function(x) {
                            c(x[1], 
                            paste0(substr(x[1], 1, nchar(x[1]) - nchar(x[2])), x[2]))
                          }))
special.cases
[1] "6563573323" "6563573322" "6543826922" "6543826921"

# Put the special.cases back with strings.cleaned
strings.cleaned = c(strings.cleaned, special.cases)

# Select last 8 digits from each phone number
phone.nums = as.numeric(substr(strings.cleaned, nchar(strings.cleaned) - 7, 
                                                nchar(strings.cleaned)))
phone.nums
 [1] 87225324 62983211 62983211 87225324 62962995 66608060 64368308 90227744 62962995 64278436
[11] 63573323 63573322 43826922 43826921

票数 4

Stack Overflow用户

发布于 2014-05-06 15:18:32

pattern参数接受任何正则表达式。因此，例如，如果使用str_extract_all(strings, pattern)，将正则表达式"[0-9]" (提取字符串的任何数字部分)插入到pattern参数中，将返回来自strings元素的每个元素的数字列表。其他正则表达式的例子可以在这里找到：https://docs.python.org/2/library/re.html。

这就是使用string作为正则表达式从向量"[0-9]"返回的内容：

Str_extract_all(字符串，"0-9")

[[1]]
[1] "8" "7" "2" "2" "5" "3" "2" "4"
[[2]]
[1] "6" "5" "6" "2" "9" "8" "3" "2" "1" "1"
[[3]]
[1] "6" "5" "6" "2" "9" "8" "3" "2" "1" "1"
[[4]]
[1] "8" "7" "2" "2" "5" "3" "2" "4"
[[5]]
[1] "6" "5" "6" "2" "9" "6" "2" "9" "9" "5"
[[6]]
[1] "6" "5" "6" "6" "6" "0" "8" "0" "6" "0"
[[7]]
[1] "6" "5" "6" "4" "3" "6" "8" "3" "0" "8"
[[8]]
[1] "6" "5" "9" "0" "2" "2" "7" "7" "4" "4"
[[9]]
[1] "6" "5" "6" "2" "9" "6" "2" "9" "9" "5"
[[10]]
[1] "6" "5" "6" "4" "2" "7" "8" "4" "3" "6"
[[11]]
[1] "6" "5" "6" "3" "5" "7" "3" "3" "2" "3" "3" "2" "2"

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/23498153

复制

相似问题

问从R中的字符串中提取电话号码
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从R中的字符串中提取电话号码EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从R中的字符串中提取电话号码
EN