我有这样的熊猫数据结构:
>>> df
Benny Daniel Doris Eric Jack Zoe
Age 75 30 95 25 28 23
Salary 2000 9000 100000 10000 12000 20000 我希望找到几个不同组的平均年龄和薪资,其中每个组都是列的子集,它们可能重叠,例如,本词典:
{'Parrot lovers': ['Doris', 'Benny'], 'Tea Drinkers': ['Doris', 'Zoe'],\
'Maintainance': ['Benny', 'Jack'], 'Coffee Drinkers': ['Benny', 'Eric'],\
'Senior Management': ['Doris', 'Zoe', 'Jack']}我如何设计一个群比函数来完成这个任务?
发布于 2014-08-25 17:05:49
我就是这样解决问题的..。
import StringIO
import pandas as pd
df = """index Benny Daniel Doris Eric Jack Zoe
Age 75 30 95 25 28 23
Salary 2000 9000 100000 10000 12000 20000"""
df = pd.read_csv(StringIO.StringIO(df),sep="\s+").set_index('index')
d = {'Parrot lovers': ['Doris', 'Benny'], 'Tea Drinkers': ['Doris', 'Zoe'],\
'Maintainance': ['Benny', 'Jack'], 'Coffee Drinkers': ['Benny', 'Eric'],\
'Senior Management': ['Doris', 'Zoe', 'Jack']}对于解决方案,只需使用.loc并遍历字典.
averages = {k:df.loc[:,v].mean(axis=1) for k,v in d.iteritems()}
print pd.DataFrame(averages).T #gives the nice printout...
index Age Salary
Coffee Drinkers 50.000000 6000
Maintainance 51.500000 7000
Parrot lovers 85.000000 51000
Senior Management 48.666667 44000
Tea Drinkers 59.000000 60000发布于 2014-08-25 17:01:36
可能有几种方法可以做到这一点,这里有一条路。
转置数据,并为类别添加一个True/False列:
In [20]: group_map = {'Parrot lovers': ['Doris', 'Benny'],
'Tea Drinkers': ['Doris', 'Zoe'],
'Maintainance': ['Benny', 'Jack'],
'Coffee Drinkers': ['Benny', 'Eric'],
'Senior Management': ['Doris', 'Zoe', 'Jack']}
In [22]: df = df.T
In [23]: for k in group_map:
...: df[k] = df.index.isin(group_map[k])现在,您可以对任意类别进行分组以获得以下方法:
In [24]: df.groupby('Parrot lovers')['Salary'].mean()
Out[24]:
Parrot lovers
False 12750
True 51000
Name: Salary, dtype: int64或者,对列进行迭代,以获得每个类别的平均值。
In [24]: means = {}
...: for k in group_map:
...: means[k] = df.groupby(k)['Salary'].mean()[True]
...: means
...:
Out[24]:
{'Coffee Drinkers': 6000,
'Maintainance': 7000,
'Parrot lovers': 51000,
'Senior Management': 44000,
'Tea Drinkers': 60000}https://stackoverflow.com/questions/25490413
复制相似问题