文章/答案/技术大牛

发布

社区首页 >问答首页 >Python Pandas SUMIF excel等效

问Python Pandas SUMIF excel等效
EN

Stack Overflow用户

提问于 2022-05-14 12:50:19

回答 1查看 135关注 0票数 1

我不知道如何在python脚本中完成某个任务。

我有一个包含特定主题媒体报道的dataframe。例如，我的一个专栏提到了这篇文章的作者.

我正试图创建一个枢轴表，以显示列中每个记者的总数，如下所示

datajournalist = company1_topline.pivot_table(index='AuthorUsername', values='ContentID', aggfunc= np.count_nonzero )

这会让我觉得

AuthorUsername                                                 count_nonzero
Aaron Mehta                                                      1              
Aamer Madhani                                                    1               
Aamer Madhani ; Ben Fox                                          1

我正在寻找的是一种方法，让支点表也计数出现在多个值单元格中的名称，以获得每个作者的真实计数。所以，举个例子，有“阿默尔·马德哈尼；本·福克斯”的那一排也会被数到“阿默尔·马德哈尼”，所以“阿默尔·马德哈尼”的那一行将有一个2而不是1的数等等……有办法这样做吗？在excel中，这可以通过SUMIF实现，但我不知道如何使用Python/Pandas来实现这一点。

期望输出

AuthorUsername                                                 count_nonzero
Aaron Mehta                                                      1              
Aamer Madhani                                                    2               
Aamer Madhani ; Ben Fox                                          1

如果有人能为我指明正确的方向，我将不胜感激。

python-3.x

excel

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-05-14 13:39:14

如果您的DataFrame具有这样的AuthorUsername列：

            AuthorUsername
0              Aaron Mehta
1            Aamer Madhani
2  Aamer Madhani ; Ben Fox

你可以：

import collections

# Remove leading and trailing spaces (if any).
df['AuthorUsername'] = df['AuthorUsername'].str.strip()

# Get unique authors and their counts.
authors_counts = collections.Counter(df['AuthorUsername'].str.split('\s*;\s*').sum())

# Add to new column.
real_counts = collections.defaultdict(lambda: 1, authors_counts)
df['count_nonzero'] = [real_counts[a] for a in df['AuthorUsername']]

print(df)

结果：

            AuthorUsername  count_nonzero
0              Aaron Mehta              1
1            Aamer Madhani              2
2  Aamer Madhani ; Ben Fox              1

在评论之后编辑-和更多的指标：

在注释之后，这里有一个更通用的版本，它还可以与Metrics列和在一起，还有其他的。

输入数据：

            AuthorUsername  Metrics
0              Aaron Mehta      1.3
1            Aamer Madhani      2.0
2  Aamer Madhani ; Ben Fox      0.5

代码：

df['AuthorUsername'] = df['AuthorUsername'].str.strip()
df['single_authors'] = df['AuthorUsername'].str.split('\s*;\s*')

df['count_nonzero'] = 1
single_metrics = df.explode('single_authors').groupby('single_authors').sum()
multiple_metrics = df[df['single_authors'].map(len) > 1].groupby('AuthorUsername').sum()

all_metrics = pd.concat([single_metrics, multiple_metrics]).rename_axis('AuthorUsername').reset_index()

df = df.drop(columns=['Metrics', 'count_nonzero', 'single_authors']).merge(all_metrics, how='left', on='AuthorUsername')

print(df)

结果：

            AuthorUsername  Metrics  count_nonzero
0              Aaron Mehta      1.3              1
1            Aamer Madhani      2.5              2
2  Aamer Madhani ; Ben Fox      0.5              1

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72240369

复制

相似问题

问Python Pandas SUMIF excel等效
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Pandas SUMIF excel等效EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Pandas SUMIF excel等效
EN