文章/答案/技术大牛

发布

社区首页 >问答首页 >获取多列组合的指标总和

问获取多列组合的指标总和
EN

Stack Overflow用户

提问于 2018-06-11 03:53:39

回答 2查看 171关注 0票数 0

我有一个熊猫数据框，看起来像这样。

set language    group   version metric_1    metric_2    metric_3
X   English     1       A       100         20          5
X   French      2       A       90          10          10
X   English     1       B       80          30          15
X   French      2       B       70          20          20
Y   English     1       A       200         20          30
Y   French      2       A       180         30          20
Y   English     1       B       160         10          10
Y   French      2       B       140         20          5

我想用实验属性的所有组合来总结指标-集合、语言、组和版本。因此，汇总数据框将如下所示。

set language    group   version metric_1    metric_2    metric_3
X                               800         140         80
Y                               1000        140         80
    English                     1200        200         80
    French                      600         80          80
                1               1050        120         60
                2               750         160         100
                        A       850         140         80
                        B       950         140         80
X   English                     500         100         40
X   French                      300         40          40
Y   English                     700         100         40
Y   French                      300         40          40
X               1               350         60          30
X               2               450         80          50
Y               1               700         60          30
Y               2               300         80          50
X                       A       350         70          40
X                       B       450         70          40
Y                       A       500         70          40
Y                       B       500         70          40
    English     1               ...
    English     2               ...
    French      1               ...
    French      2               ...
    English             A       ...
    English             B       ...
    French              A       ...
    French              B       ...
                1       A       ...
                1       B       ...
                2       A       ...
                2       B       ...
X   English     1               ...
X   English     2               ...
X   French      1               ...
X   French      2               ...
Y   English     1               ...
Y   English     2               ...
Y   French      1               ...
Y   French      2               ...
X   English             A       ...
X   English             B       ...
X   French              A       ...
X   French              B       ...
Y   English             A       ...
Y   English             B       ...
Y   French              A       ...
Y   French              B       ...
X               1       A       ...
X               1       B       ...
X               2       A       ...
X               2       B       ...
Y               1       A       ...
Y               1       B       ...
Y               2       A       ...
Y               2       B       ...
    English     1       A       ...
    English     1       B       ...
    English     2       A       ...
    English     2       B       ...
    French      1       A       ...
    French      1       B       ...
    French      2       A       ...
    French      2       B       ...

我知道我可以通过使用groupby的不同组合并将所有这些组合连接到一个数据帧中来实现这种蛮力。这可能会扩展到更多的属性，所以我试图找到一种更具可伸缩性的解决方案。我一直在阅读关于itertools可用的函数，但不确定它们将如何应用。

感谢您在这方面的任何想法/指导。谢谢!

python

pandas

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-06-11 05:12:30

实际上，itertools的函数combinations将帮助您创建所有可能的组合。假设您的数据位于一个名为df的数据帧中。

from itertools import combinations
# create two list, one for all columns you want to sum, and the others
list_metric = [col for col in df.columns if 'metric' in col]
list_non_metric = [col for col in df.columns if 'metric' not in col]
# create the dataframe grouped on all columns
df_grouped = df.groupby(list_non_metric,as_index=False)[list_metric].sum() 
# use concat and list comprehension to create all the combinations
df_output = (pd.concat([df_grouped.groupby(list(combi),as_index=False)[list_metric].sum() 
                        for j in range(1, len(list_non_metric)+1) 
                          for combi in combinations(list_non_metric,j) ])
                 .fillna(''))
# reorder the columns as the input data (if necessary)
df_output = df_output[df.columns]

如果您想了解combinations的工作原理，请尝试打印以下代码行：

[combi for combi in combinations(list_non_metric,2)]

然后第二个for j in range(1, len(list_non_metric)+1)将帮助创建1，2，3，...list_non_metric的元素

票数 0

Stack Overflow用户

发布于 2018-06-11 05:51:14

这里有一种方法。我假设您只给出了数据的一个子集，因为总数没有加起来：

In []:
import itertools as it

cols = df.columns.tolist()
index = ['set', 'language', 'group', 'version']
df = df.set_index(index)
pd.concat([df.groupby(level=x).sum().reset_index()
           for n in range(1, len(index)+1)
           for x in it.combinations(range(len(index)), n)],
          sort=True)[cols].fillna('')

Out[]:
   set language group version  metric_1  metric_2  metric_3
0    X                              340        80        50
1    Y                              680        80        65
0       English                     540        80        60
1        French                     480        80        55
0                   1               540        80        60
1                   2               480        80        55
0                           A       570        80        65
1                           B       450        80        50
0    X  English                     180        50        20
1    X   French                     160        30        30
2    Y  English                     360        30        40
3    Y   French                     320        50        25
...

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50787516

复制

相似问题

问获取多列组合的指标总和
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问获取多列组合的指标总和EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问获取多列组合的指标总和
EN