我有这样的数据:
d = {'col1': ['Category1', 'Category1','Category1', 'Category1','Category1', 'Category2','Category2', 'Category2','Category2', 'Category2',],
'col2': ['Type1', 'Type2','Type3','Type4','Type5','Type1', 'Type2','Type3','Type4','Type5'],
'col3':[32,44,87,10,12,10,14,800,3200,35]}
df_test = pd.DataFrame(data=d)
df_test
col1 col2 col3
Category1 Type1 32
Category1 Type2 44
Category1 Type3 87
Category1 Type4 10
Category1 Type5 12
Category2 Type1 10
Category2 Type2 14
Category2 Type3 800
Category2 Type4 3200
Category2 Type5 35我想要得到的是两行,类别1和类别2,包含基于col3编号的排序的3最大类型。所以输出将如下所示:
col1 order1 order2 order3
Category1 Type3 Type2 Type1
Category2 Type4 Type3 Type5我试过:
df_test=df_test.groupby(['col1']).apply(lambda x: (x.groupby('col2')
.sum()
.sort_values('col3', ascending=False))
.head(3)).T和其他不同的方式都没有成功。有什么想法吗?谢谢
发布于 2022-04-25 19:04:28
df_test['col3'] = df_test.groupby('col1')['col3'].rank(ascending = False)
df1 = df_test.query('col3 <= 3').reset_index(drop = True)
df1['col4'] = 'order' + df1['col3'].astype(int).astype(str)
df1.pivot('col1', 'col4', 'col2').reset_index()
col3 col1 order1 order2 order3
0 Category1 Type3 Type2 Type1
1 Category2 Type4 Type3 Type5发布于 2022-04-25 19:51:28
df_t = df.pivot(columns='col2', index='col1')
df_t = df_t.apply(lambda s: [x[1] for x in s.nlargest(3).index], axis=1)
print(df_t)输出:
col1
Category1 [Type3, Type2, Type1]
Category2 [Type4, Type3, Type5]
dtype: objecthttps://stackoverflow.com/questions/72004314
复制相似问题