我有一个多索引的熊猫数据,我使用了groupby方法和describe方法给我以下内容:
grouped= self.HK_data.groupby(level=[0,1])
summary= grouped.describe()这意味着:
Antibody Time
Customer_Col1A2 0 count 3.000000
mean 0.757589
std 0.188750
min 0.639933
25% 0.648732
50% 0.657532
75% 0.816417
max 0.975302
10 count 3.000000
mean 0.716279
std 0.061939
min 0.665601
25% 0.681757
50% 0.697913
75% 0.741618
max 0.785324
... .........我使用以下方法计算了SEM:
SEM=grouped.mean()/(numpy.sqrt(grouped.count()))给予:
Antibody Time
Customer_Col1A2 0 0.437394
10 0.413544
120 0.553361
180 0.502792
20 0.512797
240 0.514609
30 0.505618
300 0.481021
45 0.534658
5 0.425800
60 0.430633
90 0.525115
... .........我如何concat这两个框架,以使扫描电镜成为另一个条目的总结统计?
所以,就像:
Antibody Time
Customer_Col1A2 0 count 3.000000
mean 0.757589
std 0.188750
min 0.639933
25% 0.648732
50% 0.657532
75% 0.816417
max 0.975302
SEM 0.437394
10 count 3.000000
mean 0.716279
std 0.061939
min 0.665601
25% 0.681757
50% 0.697913
75% 0.741618
max 0.785324
SEM 0.413544我试过pandas.concat,但这并没有给我想要的东西。
谢谢!
发布于 2016-11-20 16:30:50
我认为您首先添加了第三级MultiIndex,按MultiIndex.from_tuples分配了新索引,最后使用了concat和sort_index
HK_data = pd.DataFrame({'Antibody':['Customer_Col1A2','Customer_Col1A2','Customer_Col1A2'],
'Time':[0,10,10],
'Col':[7,8,9]})
HK_data = HK_data.set_index(['Antibody','Time'])
print (HK_data)
Col
Antibody Time
Customer_Col1A2 0 7
10 8
10 9grouped= HK_data.groupby(level=[0,1])
summary= grouped.describe()
print (summary)
Col
Antibody Time
Customer_Col1A2 0 count 1.000000
mean 7.000000
std NaN
min 7.000000
25% 7.000000
50% 7.000000
75% 7.000000
max 7.000000
10 count 2.000000
mean 8.500000
std 0.707107
min 8.000000
25% 8.250000
50% 8.500000
75% 8.750000
max 9.000000
SEM=grouped.mean()/(np.sqrt(grouped.count()))
#change multiindex
new_index = list(zip(SEM.index.get_level_values('Antibody'),
SEM.index.get_level_values('Time'),
['SEM'] * len(SEM.index)))
SEM.index = pd.MultiIndex.from_tuples(new_index, names=('Antibody','Time', None))
print (SEM)
Col
Antibody Time
Customer_Col1A2 0 SEM 7.000000
10 SEM 6.010408df = pd.concat([summary, SEM]).sort_index()
print (df)
Col
Antibody Time
Customer_Col1A2 0 25% 7.000000
50% 7.000000
75% 7.000000
SEM 7.000000
count 1.000000
max 7.000000
mean 7.000000
min 7.000000
std NaN
10 25% 8.250000
50% 8.500000
75% 8.750000
SEM 6.010408
count 2.000000
max 9.000000
mean 8.500000
min 8.000000
std 0.707107https://stackoverflow.com/questions/40706338
复制相似问题