首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在非结构化数据中提取特定字符串之前的日期?

如何在非结构化数据中提取特定字符串之前的日期?
EN

Stack Overflow用户
提问于 2017-03-25 17:01:25
回答 1查看 63关注 0票数 1

我有一个非结构化文本,里面有很多日期,我想在单词"Message"之前提取日期。我拥有的数据如下:

代码语言:javascript
复制
21 March 2017 23:10:45 text1
21 March 2017 23:10:45  More text…..
21 March 2017 23:10:45 And more text …..
21 March 2017 23:10:45 some more text **Message:** more text 
22 March 2017 23:10:45 text1
22 March 2017 23:10:45  More text…..
22 March 2017 23:10:45 And more text …..
22 March 2017 23:10:45 some more text **Message:** more text 
23 March 2017 23:10:45 text1
23 March 2017 23:10:45  More text…..
23 March 2017 23:10:45 And more text …..
23 March 2017 23:10:45 some more text **Message:** more text 
24 March 2017 23:10:45 text1
24 March 2017 23:10:45  More text…..
24 March 2017 23:10:45 And more text …..
24 March 2017 23:10:45 some more text **Message:** more text 

并且输出将是一个新的数据格式,其中有一列表示日期:

代码语言:javascript
复制
21 March 2017 
22 March 2017 
23 March 2017 
24 March 2017
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-03-25 17:11:30

怎么样

代码语言:javascript
复制
sub("(?<=\\d{4}).*", "", grep("Message", txt, value=TRUE), perl=TRUE)
# [1] "21 March 2017" "22 March 2017" "23 March 2017" "24 March 2017"

我们首先使用grep()txt还原为仅包含"Message“的值,然后使用sub()删除第一次出现四位数字后的所有文本。

数据:

代码语言:javascript
复制
txt <- readLines(textConnection("21 March 2017 23:10:45 text1
21 March 2017 23:10:45  More text…..
21 March 2017 23:10:45 And more text …..
21 March 2017 23:10:45 some more text **Message:** more text 
22 March 2017 23:10:45 text1
22 March 2017 23:10:45  More text…..
22 March 2017 23:10:45 And more text …..
22 March 2017 23:10:45 some more text **Message:** more text 
23 March 2017 23:10:45 text1
23 March 2017 23:10:45  More text…..
23 March 2017 23:10:45 And more text …..
23 March 2017 23:10:45 some more text **Message:** more text 
24 March 2017 23:10:45 text1
24 March 2017 23:10:45  More text…..
24 March 2017 23:10:45 And more text …..
24 March 2017 23:10:45 some more text **Message:** more text 
"))
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/43019274

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档