文章/答案/技术大牛

发布

社区首页 >问答首页 >熊猫用多列填写DataFrame中缺少的日期

问熊猫用多列填写DataFrame中缺少的日期
EN

Stack Overflow用户

提问于 2019-03-07 04:12:14

回答 2查看 785关注 0票数 2

我希望添加特定日期范围的缺失日期，但保留所有列。我发现了许多使用afreq()、resample()、reindex()的帖子，但它们似乎是针对系列赛的，我无法让他们为我的DataFrame工作。

给出一个样本数据：

data = [{'id' : '123', 'product' : 'apple', 'color' : 'red', 'qty' : 10, 'week' : '2019-3-7'}, {'id' : '123', 'product' : 'apple', 'color' : 'blue', 'qty' : 20, 'week' : '2019-3-21'}, {'id' : '123', 'product' : 'orange', 'color' : 'orange', 'qty' : 8, 'week' : '2019-3-21'}]

df = pd.DataFrame(data)


    color   id product  qty       week
0     red  123   apple   10   2019-3-7
1    blue  123   apple   20  2019-3-21
2  orange  123  orange    8  2019-3-21

我的目标是返回到下面；将qty填写为0，但填充其他列。当然，我还有很多其他的身份证。我希望能够指定要填写的开始/结束日期；此示例使用3/7至3/21。

    color   id product  qty       week
0     red  123   apple   10   2019-3-7
1    blue  123   apple   20  2019-3-21
2  orange  123  orange    8  2019-3-21
3     red  123   apple    0  2019-3-14
4     red  123   apple    0  2019-3-21 
5    blue  123   apple    0   2019-3-7
6    blue  123   apple    0  2019-3-14
7  orange  123  orange    0   2019-3-7
8  orange  123  orange    0  2019-3-14

如何才能使我的DataFrame的其余部分保持不变？

python

pandas

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-03-07 04:20:52

在这种情况下，只需使用unstack和stack + reindex即可

df.week=pd.to_datetime(df.week)
s=pd.date_range(df.week.min(),df.week.max(),freq='7 D')

df=df.set_index(['color','id','product','week']).\
      qty.unstack().reindex(columns=s,fill_value=0).stack().reset_index()
df

    color   id product    level_3     0
0    blue  123   apple 2019-03-14   0.0
1    blue  123   apple 2019-03-21  20.0
2  orange  123  orange 2019-03-14   0.0
3  orange  123  orange 2019-03-21   8.0
4     red  123   apple 2019-03-07  10.0
5     red  123   apple 2019-03-14   0.0

票数 2

Stack Overflow用户

发布于 2022-02-26 07:14:31

一种选择是使用来自化脓者的化脓者函数公开隐式丢失的行；之后您可以使用fillna填充。

# pip install pyjanitor
import pandas as pd
import janitor

df.week = pd.to_datetime(df.week)

# create new dates, which will be used to expand the dataframe
new_dates = {"week": pd.date_range(df.week.min(), df.week.max(), freq="7D")}

# use the complete function
# note how color, id and product are wrapped together 
# this ensures only missing values based on data in the dataframe is exposed
# if you want all combinations, then you get rid of the tuple,
(df
.complete(("color", "id", "product"), new_dates, sort = False)
.fillna({'qty':0, downcast='infer')
)

    id product   color  qty       week
0  123   apple     red   10 2019-03-07
1  123   apple    blue   20 2019-03-21
2  123  orange  orange    8 2019-03-21
3  123   apple     red    0 2019-03-14
4  123   apple     red    0 2019-03-21
5  123   apple    blue    0 2019-03-07
6  123   apple    blue    0 2019-03-14
7  123  orange  orange    0 2019-03-07
8  123  orange  orange    0 2019-03-14

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55035939

复制

相似问题

问熊猫用多列填写DataFrame中缺少的日期
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫用多列填写DataFrame中缺少的日期EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫用多列填写DataFrame中缺少的日期
EN