首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何分割熊猫数据中的文本数据和次数?

如何分割熊猫数据中的文本数据和次数?
EN

Stack Overflow用户
提问于 2018-06-06 13:43:46
回答 1查看 208关注 0票数 1

我有以下格式的数据数据:

代码语言:javascript
复制
df=pd.DataFrame([
    [42,{"tags":["illustration","logo","design","ui"]}],
    [81,{"tags":["typography","icon","vector","ux"]}],
    [98,{"tags":["branding","app"]}],
    [52,{"tags":["animation","web","flat"]}],
    [17,{"tags":["type","lettering"]}],
    [37,{"tags":["illustration","typography","branding","typography","branding"]}],
    [63,{"tags":["logo","icon","app","web","lettering"]}],
    [47,{"tags":["ui","ux"]}],
    [6,{"tags":["design","vector","icon","flat","lettering","branding","app"]}],
    [53,{"tags":["ui","ux","lettering","branding","app","animation","web","flat"]}],
    [64,{"tags":["branding","app","typography","branding"]}],
    [89,{"tags":["typography","branding","ux","lettering","branding"]}]
],columns=["_id","tags"])

我希望用特定数量的标记(这个数字的分布)来计数“id”的数量,因此对于上面的数据,它将是:

代码语言:javascript
复制
Number of posts    Number of tags 
     3                 2
     1                 3
     3                 4 
     3                 5
     1                 7

如何处理此任务的给定格式的文本标记?

谢谢

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-06-06 13:53:48

使用DataFrame构造函数+ Counter对每个tags的计数长度进行list理解,作为list

代码语言:javascript
复制
from collections import Counter

c = Counter([len(x['tags']) for x in df['tags']])

df = pd.DataFrame({'Number of posts':list(c.values()), ' Number of tags ': list(c.keys())})
print (df)
   Number of posts   Number of tags 
0                3                 4
1                3                 2
2                1                 3
3                3                 5
4                1                 7
5                1                 8

或者将applyvalue_counts结合使用

代码语言:javascript
复制
df = (df['tags'].apply(lambda x: len(x['tags']))
                .value_counts()
                .rename_axis('Number of tags')
                .reset_index(name='Number of posts')
                [['Number of posts','Number of tags']])
print (df)
   Number of posts  Number of tags
0                3               5
1                3               4
2                3               2
3                1               8
4                1               7
5                1               3
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50722056

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档