我的df看起来是这样的
session_id page_type
10001_0 a
10001_0 b
10001_0 b
10001_0 b
10001_0 c
10001_0 c
10002_0 a
10002_0 a
10002_0 b
10002_0 b
10002_0 c
10002_0 c我想按'session_id‘分组,并将值(’a‘,'b','c')计数为:
session_id count_page_type
10001_0 {a:1,b:3,c:2}
10002_0 {a:2,b:2,c:2}我不关心'count_page_type‘列中的类型,它也可以是list。聚合在多个列上,
agg_dict = ({'uid':'first',
'request_id':'unique',
'sso_id':'first',
'article_id' :['first','last','nunique'],
'event_time':['min','max'],
'session_duration':'sum',
'anonymous_id':['first','nunique'],
'platform':['first','nunique'],
'brand':['first','last','nunique'],
'user_type':['first','last'],
'page_type':'value_counts'})
df.groupby('session_id').agg(agg_dict)现在我得到了错误
ValueError: cannot insert page_type, already exists有什么建议吗?谢谢
发布于 2019-08-11 17:29:46
value_counts返回的不只是一行,而是一个pd.Series,请尝试这样做:
df.groupby('session_id').agg({'page_type': lambda x : x.value_counts().to_dict()})https://stackoverflow.com/questions/57448313
复制相似问题