首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用re.sub中的wordsegment中的segment()来从python中的标签中提取单词?

如何使用re.sub中的wordsegment中的segment()来从python中的标签中提取单词?
EN

Stack Overflow用户
提问于 2020-09-10 12:45:07
回答 1查看 152关注 0票数 1

我正在使用python对推文进行情感分析。在清理tweet的过程中,我想从标签中提取单词。我发现wordsegment库非常有效地完成了这项工作。然而,我的问题是,当我使用df['tweet].apply(lambda x: segment(x))在我的数据集中应用词段库来完成推文列时,它花费了我很多时间。我认为我可以通过在标签上应用segment()操作来减少这一时间。为此,我首先创建了一个函数作为For:

代码语言:javascript
复制
def extract_words(hashtags):
    words = " ".join(segment(hashtags))
    return words

然后我试着用re.sub申请

代码语言:javascript
复制
df['tweet'] = df['tweet'].apply(lambda x: re.sub(r'#(\w)+', extract_words, x))

此代码不起作用,并给我一个错误。我该怎么做才能只在标签上应用片段?

EN

回答 1

Stack Overflow用户

发布于 2020-09-13 05:23:54

作为另一种选择,您可以在extract_words函数中使用re.findall来获取每个tweet和列表中位置的所有hashtag。正则表达式应该更改为(#\w+),将标签和一个或多个限定符放在捕获组中,这将简化后续的替换函数。在那里,您可以替换由segment函数的结果找到的每个散列标签。

输入sample.csv

代码语言:javascript
复制
tweets
"RT @NatGeo: Watch: When humans live on Mars, what exactly will they call home? https://xxxxxxx/QlJBuB6FX2 #CountdownToMars"
"RT @HodderBooks: With the delectable #ActsOfLove publishing tomorrow, here's @TalulahRiley�herself to tell you about her debut novel: https?"
"RT @solarimpulse: BREAKING: we flew 40'000km without fuel. It's a first for energy, take it further! #futureisclean https://xxxxxxx/JCvKTDBVZx"
"RT @TeslaRoadTrip: #TeslaRoadTrip All - thanks so much for following our twitter feed.  The trip was a success and everyone has diverted ..."
"RT @TeslaMotors: Agreed. @FTC affirms States to allow consumers to choose how they buy their cars. #Tesla #Michigan http://xxxxxxx/fT1JHjMpzg""
代码语言:javascript
复制
import pandas as pd
import re
import wordsegment as ws
ws.load()

def extract_words(tweet):
    hashtags = re.findall(r"(#\w+)", tweet)
    for hs in hashtags:
        words = " ".join(ws.segment(hs))
        tweet = tweet.replace(hs, words)
    return tweet

df = pd.read_csv("sample.csv")
print(df)

df['NewTweet'] = df['tweets'].apply(lambda x: extract_words(x))
print(df)

NewTweet的输出

代码语言:javascript
复制
RT @NatGeo: Watch: When humans live on Mars, what exactly will they call home? https://xxxxxxx/QlJBuB6FX2 countdown to mars
RT @HodderBooks: With the delectable acts of love publishing tomorrow, here's @TalulahRiley�herself to tell you about her debut novel: https?
RT @solarimpulse: BREAKING: we flew 40'000km without fuel. It's a first for energy, take it further! future is clean https://xxxxxxx/JCvKTDBVZx
RT @TeslaRoadTrip: tesla road trip All - thanks so much for following our twitter feed.  The trip was a success and everyone has diverted ...
RT @TeslaMotors: Agreed. @FTC affirms States to allow consumers to choose how they buy their cars. tesla michigan http://xxxxxxx/fT1JHjMpzg
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63822966

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档