如何在数据透视表上应用百分位数排名?
虚拟数据集
import pandas as pd
df = pd.DataFrame({"Business": ["Hotel","Hotel", "Transport", "Agri", "Tele","Hotel", "Transport", "Agri", "Tele"],
"Location": ["101","101", "101", "101", "103",'105','102','103','106'],
"Area" : ['A','A','A','A','B','C','D','B','F']})

activity_cat_countby_subarea = df.groupby(['Area', 'Location','Business']).size().reset_index(name='counts')
activity_cat_countby_subarea = activity_cat_countby_subarea.reset_index().sort_values(['counts'], ascending=False)

在转换为数据透视表之后,我将在总体计数级别上应用排名。
activity_cat_countby_subarea['overll_pct_rank'] = activity_cat_countby_subarea['counts'].rank(pct=True)

但我的要求是,我需要应用基于每个业务计数的排名。也就是说,我需要找到每个企业的排名,即“酒店”和他们的计数。
如果您需要更多信息,请协助我
发布于 2021-04-04 15:21:35
而不是这样做:
activity_cat_countby_subarea['overll_pct_rank'] = activity_cat_countby_subarea['counts'].rank(pct=True)执行以下操作:
activity_cat_countby_subarea['overll_pct_rank']=activity_cat_countby_subarea.groupby(['Business','counts']).rank(pct=True)
activity_cat_countby_subarea.sort_index(inplace=True)
#Output
index Area Location Business counts overll_pct_rank
0 0 A 101 Agri 1 0.5
1 1 A 101 Hotel 2 1.0
2 2 A 101 Transport 1 0.5
3 3 B 103 Agri 1 1.0
4 4 B 103 Tele 1 0.5
5 5 C 105 Hotel 1 1.0
6 6 D 102 Transport 1 1.0
7 7 F 106 Tele 1 1.0https://stackoverflow.com/questions/66938844
复制相似问题