文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在pydatatable中跨列应用聚合(sum、mean、max、min等)？

问如何在pydatatable中跨列应用聚合(sum、mean、max、min等)？
EN

Stack Overflow用户

提问于 2020-07-01 23:51:34

回答 2查看 105关注 0票数 2

我有一个数据表，

DT_X = dt.Frame({
    
    'issue':['cs-1','cs-2','cs-3','cs-1','cs-3','cs-2'],
    
    'speech':[1,1,1,0,1,1],
    
    'narrative':[1,0,1,1,1,0],
    
    'thought':[0,1,1,0,1,1]
})

它可以被视为，

Out[5]: 
   | issue  speech  narrative  thought
-- + -----  ------  ---------  -------
 0 | cs-1        1          1        0
 1 | cs-2        1          0        1
 2 | cs-3        1          1        1
 3 | cs-1        0          1        0
 4 | cs-3        1          1        1
 5 | cs-2        1          0        1

[6 rows x 4 columns]

我现在对3列中的所有值进行一个组运算求和，

DT_X[:,{'speech': dt.sum(f.speech),
        'narrative': dt.sum(f.narrative),
        'thought': dt.sum(f.thought)},
        by(f.issue)]

它产生的输出为，

Out[6]: 
   | issue  speech  narrative  thought
-- + -----  ------  ---------  -------
 0 | cs-1        1          2        0
 1 | cs-2        2          0        2
 2 | cs-3        2          2        2

[3 rows x 4 columns]

在这里，我手动给出了每个字段的名称和聚合函数(dt.sum)，因为它只需要3列，我可以轻松地执行此任务，但如果我必须处理超过10个、20个等字段怎么办？

你还有其他的解决办法吗？

引用:我们在Rdatatable中有相同类型的功能，如：

DT[,lapply(.SD,sum),by=.(issue),.SDcols=c('speech','narrative','thought')]

python

py-datatable

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-07-02 01:50:56

如果给出一个多列集作为参数，datatable中的大多数函数(包括sum() )将自动应用于所有列。因此，R的lapply(.SD, sum)就变成了简单的sum(.SD)，除了python中没有.SD之外，我们使用f符号和组合。在本例中，f[:]将选择除groupby之外的所有列，因此它基本上等同于.SD。

其次，所有一元函数(即作用于单个列的函数，而不是像+或corr这样的二元函数)都会传递其列的名称。因此，sum(f[:])将生成一组名称与f[:]相同的列。

把所有这些放在一起：

>>> from datatable import by, sum, f, dt

>>> DT_X[:, sum(f[:]), by(f.issue)]
   | issue  speech  narrative  thought
-- + -----  ------  ---------  -------
 0 | cs-1        1          2        0
 1 | cs-2        2          0        2
 2 | cs-3        2          2        2

[3 rows x 4 columns]

票数 2

Stack Overflow用户

发布于 2020-07-02 00:07:05

这是@Erez推荐的解决方案之一。

DT_X[:,{name: dt.sum(getattr(f, name)) for name in ['speech', 'narrative', 'thought']},
by(f.issue)]

和输出：-

Out[7]: 
   | issue  speech  narrative  thought
-- + -----  ------  ---------  -------
 0 | cs-1        1          2        0
 1 | cs-2        2          0        2
 2 | cs-3        2          2        2

[3 rows x 4 columns]

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62680636

复制

相似问题

问如何在pydatatable中跨列应用聚合(sum、mean、max、min等)？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在pydatatable中跨列应用聚合(sum、mean、max、min等)？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在pydatatable中跨列应用聚合(sum、mean、max、min等)？
EN