问熊猫数火柴和群
EN

Stack Overflow用户

提问于 2022-04-05 18:54:34

回答 1查看 98关注 0票数 0

我正在寻找一种方法，从包含用户名和twitter文本的DataFrame中提取最常见的哈希标签(每个用户和所有用户)。twitter文本可以包含哈希标签。

           username tweet_id    created_at  text    in_reply_to_tweet_id    in_reply_to_user    retweet_count   favorite_count
    0   mmitchell_ai    1506357982061158401 2022-03-22 19:51:44+00:00   What does it mean to "sanction" Google? What a...   NaN NaN 1   12
    1   mmitchell_ai    1506357632793149441 2022-03-22 19:50:21+00:00   RT @SanhEstPasMoi: @ClementDelangue @aurelieng...   NaN NaN 1   0

参见上面的示例数据。

我能够得到每个主题标签的频繁出现，如下所示

import re

df.text.str.extractall(r'(\#\w+)')[0].value_counts()

但我无法按用户名对结果进行分组。

python

pandas

回答 1

Stack Overflow用户

发布于 2022-04-05 21:16:53

你们关系很好。您错过了将结果加入到原来的dataframe中：

hashtags = df['text'].str.extractall(r'(?P<hashtag>\#\w+)').droplevel(1)
out = df[['username']].join(hashtags).value_counts()
print(out)

# Output
username      hashtag  
mmitchell_ai  #hashtag2    2
              #hashtag1    1
              #hashtag3    1
dtype: int64

输入数据：

>>> df
       username                                     text
0  mmitchell_ai  blah blah #hashtag1 blah #hashtag2 blah
1  mmitchell_ai       blah #hashtag3 blah blah #hashtag2

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71757038

复制

相似问题

问熊猫数火柴和群
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫数火柴和群EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫数火柴和群
EN