文章/答案/技术大牛

发布

社区首页 >问答首页 >在每个TimeStamp上拆分注释

问在每个TimeStamp上拆分注释
EN

Stack Overflow用户

提问于 2020-04-29 16:04:36

回答 1查看 134关注 0票数 0

嘿，我在一个单元格中有一个带有各种时间戳的评论如下：-

2019-07-26 20:36:19 -(工作说明)通知来电者，交易已从Concur中删除。将INC作为待决行动解决。向呼叫者发送解决方案电子邮件，复制粘贴响应并简化/总结从Eng team YesUpdate接收到的信息工作记录YesUpdate状态给等待用户是

2019-07-26 10:32:05 -单流(工作注释)代码Hi Team.We已经删除了那些gits。

我想要的是将这个单元格分割成行，这样每个时间戳都会与各自的文本分开。

，请帮忙。R或Python中的任何代码都会有帮助。

split

comments

datetime

text

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-04-29 16:40:50

使用regex的Python选项

import re

s = """2019-07-26 20:36:19 - (Work notes) Informed the caller that the [...]
line without timestamp!
2019-07-26 10:32:05 - oneflow (Work notes)[code] Hi Team.We have removed those gits."""

# search for the timestamps
timestamps = re.findall(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}', s)

# if timestamps were found, obtain their indices in the string:
if timestamps:
    idx = [s.index(t) for t in timestamps] + [None] # add None to get the last part...

    # split the string and put the results in tuples:
    text_tuples = []
    l = len(timestamps[0]) # how many characters to expect for the timestamp
    for i, j in zip(idx[:-1], idx[1:]): # use zip to iterate over two sequences at once
        text_tuples.append((s[i:i+l], # timestamp
                            s[i+l:j].strip(' - '))) # part before next timestamp

# text_tuples
# [('2019-07-26 20:36:19',
#   '(Work notes) Informed the caller that the [...]\nline without timestamp!\n'),
#  ('2019-07-26 10:32:05',
#   'oneflow (Work notes)[code] Hi Team.We have removed those gits.')]

在本例中，您将得到一个包含时间戳和相应行馀部分的元组列表。如果一行没有时间戳，它将不会进入输出。

编辑:OP评论后的pandas DataFrame的扩展：

import re
import pandas as pd

# create a custom function to split the comments:
def split_comment(s):
    # search for the timestamps
    timestamps = re.findall(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}', s)

    # if timestamps were found, obtain their indices in the string:
    if timestamps:
        idx = [s.index(t) for t in timestamps] + [None] # add None to get the last part...
        # split the string and put the results in tuples:
        splitted = []
        l = len(timestamps[0]) # how many characters to expect for the timestamp
        for i, j in zip(idx[:-1], idx[1:]): # use zip to iterate over two sequences at once
            splitted.append([s[i:i+l], # timestamp
                             s[i+l:j].strip(' - ')]) # part before next timestamp
        return splitted
    return ['NaT', s] # no timestamp found, return s

s0 = """2019-07-26 20:36:19 - (Work notes) Informed the caller that the [...]
line without timestamp!
2019-07-26 10:32:05 - oneflow (Work notes)[code] Hi Team.We have removed those gits."""
s1 = "2019-07-26 20:36:23  another comment"

# create example df
df = pd.DataFrame({'s': [s0, s1], 'id': [0, 1]})

# create a dummy column that holds the resulting series we get if we apply the function:
df['tmp'] = df['s'].apply(split_comment)

# explode the df so we have one row for each timestamp / comment pair:
df = df.explode('tmp').reset_index(drop=True)

# create two columns from the dummy column, 'timestamp' and 'comment':
df[['timestamp', 'comment']] = pd.DataFrame(df['tmp'].to_list(), index=df.index)

# drop stuff we dont need anymore:
df = df.drop(['s', 'tmp'], axis=1)

# so now we have:
# df
#    id            timestamp                                            comment
# 0   0  2019-07-26 20:36:19  (Work notes) Informed the caller that the [......
# 1   0  2019-07-26 10:32:05  oneflow (Work notes)[code] Hi Team.We have rem...
# 2   1  2019-07-26 20:36:23                                    another comment

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61506269

复制

相似问题

问在每个TimeStamp上拆分注释
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在每个TimeStamp上拆分注释EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在每个TimeStamp上拆分注释
EN