我正在使用python对推文进行情感分析。在清理tweet的过程中,我想从标签中提取单词。我发现wordsegment库非常有效地完成了这项工作。然而,我的问题是,当我使用df['tweet].apply(lambda x: segment(x))在我的数据集中应用词段库来完成推文列时,它花费了我很多时间。我认为我可以通过在标签上应用segment()操作来减少这一时间。为此,我首先创建了一个函数作为For:
def extract_words(hashtags):
words = " ".join(segment(hashtags))
return words然后我试着用re.sub申请
df['tweet'] = df['tweet'].apply(lambda x: re.sub(r'#(\w)+', extract_words, x))此代码不起作用,并给我一个错误。我该怎么做才能只在标签上应用片段?
发布于 2020-09-13 05:23:54
作为另一种选择,您可以在extract_words函数中使用re.findall来获取每个tweet和列表中位置的所有hashtag。正则表达式应该更改为(#\w+),将标签和一个或多个限定符放在捕获组中,这将简化后续的替换函数。在那里,您可以替换由segment函数的结果找到的每个散列标签。
输入sample.csv
tweets
"RT @NatGeo: Watch: When humans live on Mars, what exactly will they call home? https://xxxxxxx/QlJBuB6FX2 #CountdownToMars"
"RT @HodderBooks: With the delectable #ActsOfLove publishing tomorrow, here's @TalulahRiley�herself to tell you about her debut novel: https?"
"RT @solarimpulse: BREAKING: we flew 40'000km without fuel. It's a first for energy, take it further! #futureisclean https://xxxxxxx/JCvKTDBVZx"
"RT @TeslaRoadTrip: #TeslaRoadTrip All - thanks so much for following our twitter feed. The trip was a success and everyone has diverted ..."
"RT @TeslaMotors: Agreed. @FTC affirms States to allow consumers to choose how they buy their cars. #Tesla #Michigan http://xxxxxxx/fT1JHjMpzg""import pandas as pd
import re
import wordsegment as ws
ws.load()
def extract_words(tweet):
hashtags = re.findall(r"(#\w+)", tweet)
for hs in hashtags:
words = " ".join(ws.segment(hs))
tweet = tweet.replace(hs, words)
return tweet
df = pd.read_csv("sample.csv")
print(df)
df['NewTweet'] = df['tweets'].apply(lambda x: extract_words(x))
print(df)NewTweet的输出
RT @NatGeo: Watch: When humans live on Mars, what exactly will they call home? https://xxxxxxx/QlJBuB6FX2 countdown to mars
RT @HodderBooks: With the delectable acts of love publishing tomorrow, here's @TalulahRiley�herself to tell you about her debut novel: https?
RT @solarimpulse: BREAKING: we flew 40'000km without fuel. It's a first for energy, take it further! future is clean https://xxxxxxx/JCvKTDBVZx
RT @TeslaRoadTrip: tesla road trip All - thanks so much for following our twitter feed. The trip was a success and everyone has diverted ...
RT @TeslaMotors: Agreed. @FTC affirms States to allow consumers to choose how they buy their cars. tesla michigan http://xxxxxxx/fT1JHjMpzghttps://stackoverflow.com/questions/63822966
复制相似问题