我正在尝试将丢失的工作日插入到Pandas时间序列数据中。插入的工作日必须在每个数据列中都有NaN值。当我在Insert missing weekdays in pandas dataframe and fill them with NaN中尝试答案时,新的行中填充了0而不是NaN。为了说明:
import pandas as pd
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'], # Thursday and Tuesday.
'num': [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.resample('B').sum() # Insert Friday and Monday.但是,df现在是:
num
date
2022-10-06 123
2022-10-07 0
2022-10-10 0
2022-10-11 456我得到的不是NaN,而是0。我如何得到NaN呢?这就是我想要的:
num
date
2022-10-06 123
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456(熊猫版本1.3.2,Python版本3.8.10)
发布于 2022-10-10 12:25:22
使用.asfreq()而不是.sum()
df.resample('B').asfreq()输出:
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0发布于 2022-10-10 12:36:59
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'], # Thursday and Tuesday.
'num': [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')如果唯一的日期时间,请使用DataFrame.asfreq:
df1 = df.asfreq('B')
print (df1)
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0如果可能,重复并需要聚合sum添加参数min_count=1
df2 = df.resample('B').sum(min_count=1)
print (df2)
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'] * 2, # Thursday and Tuesday.
'num': [123, 456, 10, 20]
})
df['date'] = pd.to_datetime(df['date'])
print (df)
date num
0 2022-10-06 123
1 2022-10-11 456
2 2022-10-06 10
3 2022-10-11 20
df = df.set_index('date')
df2 = df.resample('B').sum(min_count=1)
print (df2)
num
date
2022-10-06 133.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 476.0df1 = df.asfreq('B')
print (df1)ValueError:无法从重复轴
重新索引
https://stackoverflow.com/questions/74014804
复制相似问题