首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python数据透视表margins=True求和不正确

Python数据透视表margins=True求和不正确
EN

Stack Overflow用户
提问于 2018-07-31 05:40:30
回答 1查看 2.9K关注 0票数 3

我有以下代码:

代码语言:javascript
复制
import pandas as pd
df=pd.read_csv("https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1", encoding="latin-1")
pvt_received=df.pivot_table(index=['site'], values = ['received','sent'], aggfunc = {  'received' : 'count' ,'sent': 'count'}, fill_value=0, margins=True) 
pvt_received['to_send']=pvt_received['received']-pvt_received['sent']
column_order = ['received', 'sent','to_send']
pvt_received_ordered = pvt_received.reindex_axis(column_order, axis=1)
pvt_received_ordered.to_csv("test_pivot.csv")
table_to_send = pd.read_csv('test_pivot.csv', encoding='latin-1')
table_to_send.rename(columns={'site':'Site','received':'Date Received','sent':'Date Sent','to_send':'Date To Send'}, inplace=True)
table_to_send.set_index('Site', inplace=True)
table_to_send

它们会生成这个表:

代码语言:javascript
复制
      Date Received       Date Sent       Date To Send
Site            
2         32.0             27.0           5.0
3         20.0             17.0           3.0
4         33.0             31.0           2.0
5         40.0             31.0           9.0
All       106.0            106.0          0.0

但此参数margins=True没有给出每列的总和的正确结果。例如,接收日期应该是125而不是106,发送日期应该是106 (正确),发送日期应该是19而不是0.0 (零)。问:我应该更改什么才能获得正确的数字?此外,还缺少对每一行进行求和的所有内容。在此之前非常感谢。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-12-04 11:31:19

从您的代码中可以看出,您是在构建数据透视表之后创建Date To Send的,所以它只会给出以下结果:106.0 - 106.0。此外,在分组后,它们的边距值用默认的dropna=True进行calculated,这意味着任何具有NaNNaT的行都将被删除。设置dropna=False应该可以解决这个问题。

在创建数据透视表和date_time列之前,我重构了代码,将receivedsent列转换为date_time格式。

代码语言:javascript
复制
df2 = pd.read_csv(
         "https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1"
         ,encoding="latin-1")
df2['received'] = pd.to_datetime(df2['received'])
df2['sent'] = pd.to_datetime(df2['sent'])

然后创建数据透视表,这是最初的目的。

代码语言:javascript
复制
pvt_received = df2.pivot_table(index=['site'], values=['received','sent'],\
    aggfunc='count', margins=True, dropna=False)

pvt_received['to_send'] = pvt_received['received'] - pvt_received['sent']
pvt_received.rename(columns={'site':'Site'
                             ,'received':'Date Received'
                             ,'sent':'Date Sent'
                             ,'to_send':'Date To Send'}
                             ,inplace=True)
pvt_received

        Date Received   Date Sent   Date To Send
Site            
2       32              27          5
3       20              17          3
4       33              31          2
5       40              31          9
All     125             106         25
票数 7
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51602661

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档