首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在pandas中应用group by后获取最大计数的行值

在pandas中应用group by后获取最大计数的行值
EN

Stack Overflow用户
提问于 2018-09-09 17:19:29
回答 3查看 1.7K关注 0票数 1

我有以下df

代码语言:javascript
复制
>In [260]: df
>Out[260]:
    size market vegetable  confirm availability
0  Large    ABC    Tomato                   NaN
1  Large    XYZ    Tomato                   NaN
2  Small    ABC    Tomato                   NaN
3  Large    ABC     Onion                   NaN
4  Small    ABC     Onion                   NaN
5  Small    XYZ     Onion                   NaN
6  Small    XYZ     Onion                   NaN
7  Small    XYZ   Cabbage                   NaN
8  Large    XYZ   Cabbage                   NaN
9  Small    ABC   Cabbage                   NaN

1)如何获取个数最大的蔬菜的大小?

我使用groupby对蔬菜和大小进行操作,以获得以下df,但我需要获取包含带有蔬菜的最大大小计数的行

代码语言:javascript
复制
In [262]: df.groupby(['vegetable','size']).count()
Out[262]:                 market  confirm availability
vegetable size
Cabbage   Large       1                     0
          Small       2                     0
Onion     Large       1                     0
          Small       3                     0
Tomato    Large       2                     0
          Small       1                     0

df2['vegetable','size'] = df.groupby(['vegetable','size']).count().apply( some logic )

所需的Df:

代码语言:javascript
复制
  vegetable   size   max_count
0   Cabbage   Small     2
1     Onion   Small     3
2    Tomato   Large     2

2)现在我可以说“小卷心菜”在df有大量的供应。所以我需要用small填充所有卷心菜行的确认可用性列,该怎么做呢?

代码语言:javascript
复制
    size market vegetable  confirm availability
0  Large    ABC    Tomato                   Large
1  Large    XYZ    Tomato                   Large
2  Small    ABC    Tomato                   Large
3  Large    ABC     Onion                   Small
4  Small    ABC     Onion                   Small
5  Small    XYZ     Onion                   Small
6  Small    XYZ     Onion                   Small
7  Small    XYZ   Cabbage                   Small    
8  Large    XYZ   Cabbage                   Small    
9  Small    ABC   Cabbage                   Small
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-09-09 18:48:32

1)

代码语言:javascript
复制
required_df = veg_df.groupby(['vegetable','size'], as_index=False)['market'].count()\
         .sort_values(by=['vegetable', 'market'])\
         .drop_duplicates(subset='vegetable', keep='last')

2)

代码语言:javascript
复制
merged_df = veg_df.merge(required_df, on='vegetable')
cols = ['size_x', 'market_x', 'vegetable', 'size_y']
dict_renaming_cols = {'size_x': 'size', 
                      'market_x': 'market',
                      'size_y': 'confirm_availability'}
merged_df = merged_df.loc[:,cols].rename(columns=dict_renaming_cols)
票数 2
EN

Stack Overflow用户

发布于 2018-09-09 19:02:29

您可以使用count进行GroupBy,然后对重复项进行排序和删除:

代码语言:javascript
复制
res = df.groupby(['size', 'vegetable'], as_index=False)['market'].count()\
        .sort_values('market', ascending=False)\
        .drop_duplicates('vegetable')

print(res)

    size vegetable  market
4  Small     Onion       3
2  Large    Tomato       2
3  Small   Cabbage       2
票数 2
EN

Stack Overflow用户

发布于 2018-09-09 18:29:29

您可以将分组的数据帧分配给另一个对象,然后您可以对索引进行其他分组,以获得所需的最大值

代码语言:javascript
复制
d = df.groupby(['vegetable','size']).count()
d.groupby(d.index.get_level_values(0).tolist()).apply(lambda x:x[x.confirm == x.confirm.max()])

输出:

代码语言:javascript
复制
                     market confirm availability
vegetable   size            
Cabbage Cabbage Small   2   2   0
Onion   Onion   Small   3   3   0
Tomato  Tomato  Large   2   2   0
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52243060

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档