文章/答案/技术大牛

发布

社区首页 >问答首页 >获取字符串中准确的8位数字并进行转换。

问获取字符串中准确的8位数字并进行转换。
EN

Stack Overflow用户

提问于 2015-03-22 06:17:03

回答 1查看 896关注 0票数 0

在使用R提取和转换数据时，我遇到了两个问题，下面是数据集：

messageID | msg
1111111111 | hey id 18271801, fix it asap
2222222222 | please fix it soon id12901991 and 91222911. dissapointed
3333333333 | wow $300 expensive man, come on
4444444444 | number 2837169119 test

问题是：

我想要一个只有8位长的数字。在上面的数据集中，不应该包括消息id 3333.(300-3位)和4444.(2837169119位- 10位)。到目前为止，这是我最好的机会：

as.matrix(unlist(apply(df2,1 2，1，function(x){regmatches(x，gregexpr(‘(0-9){8}，x)})

。

然而，这一行代码，消息444.包括在内，因为is包含8位以上的数字。

将数据转换为另一种形式，如下所示： message_id \x{e76f} customer_ID 1111111111 \ 18271801 2222222222 \x{e76f} 12901991 2222222222 \ 91222911 我不知道如何有效地转换数据。dput(df)输出

结构(list( id = c(1111111111,2222222222,3333333333,4444444444 )，msg =c(“嗨id 18271801，尽快修复它”，“请尽快修复它id12901991和91222911。请取消”，“哇，300美元昂贵的男人，来吧”，“编号2837169119 test”)，.Names = c("id"，"msg")，row.names = c(NA，4L)，class = "data.frame")

谢谢

regex

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-03-22 06:45:04

使用rebus创建正则表达式，使用stringr提取匹配。

您可能需要使用正则表达式的确切形式。这段代码适用于您的示例，但您可能需要将其修改为您的数据集。

library(rebus)
library(stringr)

# Create regex
rx <- negative_lookbehind(DGT) %R%
  dgt(8) %R%  
  negative_lookahead(DGT)
rx
## <regex> (?<!\d)[\d]{8}(?!\d)

# Extract the IDs
extracted_ids <- str_extract_all(df$msg, perl(rx))

# Stuff the IDs into a data frame.
data.frame(
  messageID = rep(
    df$id, 
    vapply(extracted_ids, length, integer(1))
  ),
  extractedID = unlist(extracted_ids, use.names = FALSE)
)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/29191598

复制

相似问题

问获取字符串中准确的8位数字并进行转换。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问获取字符串中准确的8位数字并进行转换。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问获取字符串中准确的8位数字并进行转换。
EN