这是我的数据:“”
customer product product1
0 A hats shoes
1 A socks shoes
2 B socks shoes
3 C hats shoes
4 C None accessories
5 B socks shoes
6 A hats shoes
7 C None accessories‘我想输出这样的东西:
customer shoes hats socks accessories
A # # # #
B # # # #
C # # # #我试过这样的分组:‘
dfB.set_index('customer').groupby(['product', 'product1']).agg({'product':['count'], 'product1':['count']}) '''我得到了这样的输出:
“”“
product product1
count count
product product1
hats shoes 3 3
socks shoes 3 3“”“
请帮帮忙
发布于 2020-02-06 17:02:49
你可以melt然后pivot_table
# df = df.replace('None', None) # If `'None'` and not `None`
(df.melt('customer', value_name='product')
.pivot_table(index='customer', columns='product', aggfunc='size'))
product accessories hats shoes socks
customer
A NaN 2.0 3.0 1.0
B NaN NaN 2.0 2.0
C 2.0 3.0 1.0 NaN发布于 2020-02-06 16:39:29
IIUC
我们可以将索引设置为“customer”,然后对数据进行堆栈,允许您使用value_counts进行聚合。
df2 = df.set_index('customer').stack().groupby(level=0).value_counts().unstack()-
print(df2)
None accessories hats shoes socks
customer
A NaN NaN 2.0 3.0 1.0
B NaN NaN NaN 2.0 2.0
C 2.0 2.0 1.0 1.0 NaN如果您不关心None,则可以将其转换为真正的空值,并且它将在groupby中被忽略。
print(df.replace('None',np.nan).set_index('customer').stack().groupby(level=0).value_counts().unstack())
accessories hats shoes socks
customer
A NaN 2.0 3.0 1.0
B NaN NaN 2.0 2.0
C 2.0 1.0 1.0 NaNhttps://stackoverflow.com/questions/60099669
复制相似问题