给出如下数据样本:
date value1 value2 value3
0 2021-10-12 1.015 1.115668 1.015000
1 2021-10-13 NaN 1.104622 1.030225
2 2021-10-14 NaN 1.093685 NaN
3 2021-10-15 1.015 1.082857 NaN
4 2021-10-16 1.015 1.072135 1.077284
5 2021-10-29 1.015 1.061520 1.093443
6 2021-10-30 1.015 1.051010 1.109845
7 2021-10-31 1.015 NaN 1.126493
8 2021-11-1 1.015 NaN NaN
9 2021-11-2 1.015 1.020100 NaN
10 2021-11-3 NaN 1.010000 NaN
11 2021-11-30 1.015 1.000000 NaN假设我想在2021年11月删除值都为NaN**s**的的列,这意味着2021-11-01到2021-11-30的范围(包括开始日期和结束日期)。
在这个要求下,vlue3将被删除,因为它在2021-11中的所有值都是NaN的。其他列在2021-11中有NaN,但不是全部,所以这些列将被保留。
我怎么能在潘达斯实现这一点?谢谢。
编辑:
df['date'] = pd.to_datetime(df['date'])
mask = (df['date'] >= '2021-11-01') & (df['date'] <= '2021-11-30')
df.loc[mask]退出:
date value1 value2 value3
8 2021-11-01 1.015 NaN NaN
9 2021-11-02 1.015 1.0201 NaN
10 2021-11-03 NaN 1.0100 NaN
11 2021-11-30 1.015 1.0000 NaN发布于 2021-11-29 10:45:30
您可以通过November of 2021筛选行,并根据条件测试所有行是否具有NaN:
df['date'] = pd.to_datetime(df['date'])
df = df.loc[:, ~df[df['date'].dt.to_period('m') == pd.Period('2021-11')].isna().all()]或者:
df['date'] = pd.to_datetime(df['date'])
df = df.loc[:, df[df['date'].dt.to_period('m') == pd.Period('2021-11')].notna().any()]编辑:如果需要手动设置某些列以避免处理,请使用:
mask = (df['date'] >= '2021-11-01') & (df['date'] <= '2021-11-30')
df = df.loc[:, df.loc[mask].notna().any()]退出:
date value1 value2
0 2021-10-12 1.015 1.115668
1 2021-10-13 NaN 1.104622
2 2021-10-14 NaN 1.093685
3 2021-10-15 1.015 1.082857
4 2021-10-16 1.015 1.072135
5 2021-10-29 1.015 1.061520
6 2021-10-30 1.015 1.051010
7 2021-10-31 1.015 NaN
8 2021-11-01 1.015 NaN
9 2021-11-02 1.015 1.020100
10 2021-11-03 NaN 1.010000
11 2021-11-30 1.015 1.000000编辑:
df = df.assign(value4 = np.nan)
print (df)
date value1 value2 value3 value4
0 2021-10-12 1.015 1.115668 1.015000 NaN
1 2021-10-13 NaN 1.104622 1.030225 NaN
2 2021-10-14 NaN 1.093685 NaN NaN
3 2021-10-15 1.015 1.082857 NaN NaN
4 2021-10-16 1.015 1.072135 1.077284 NaN
5 2021-10-29 1.015 1.061520 1.093443 NaN
6 2021-10-30 1.015 1.051010 1.109845 NaN
7 2021-10-31 1.015 NaN 1.126493 NaN
8 2021-11-1 1.015 NaN NaN NaN
9 2021-11-2 1.015 1.020100 NaN NaN
10 2021-11-3 NaN 1.010000 NaN NaN
11 2021-11-30 1.015 1.000000 NaN NaNdf['date'] = pd.to_datetime(df['date'])
m = df[df['date'].dt.to_period('m') == pd.Period('2021-11')].isna().all()
m.loc['value4'] = False
print (m)
date False
value1 False
value2 False
value3 True
value4 False
dtype: bool
df = df.loc[:, ~m]
print (df)
date value1 value2 value4
0 2021-10-12 1.015 1.115668 NaN
1 2021-10-13 NaN 1.104622 NaN
2 2021-10-14 NaN 1.093685 NaN
3 2021-10-15 1.015 1.082857 NaN
4 2021-10-16 1.015 1.072135 NaN
5 2021-10-29 1.015 1.061520 NaN
6 2021-10-30 1.015 1.051010 NaN
7 2021-10-31 1.015 NaN NaN
8 2021-11-01 1.015 NaN NaN
9 2021-11-02 1.015 1.020100 NaN
10 2021-11-03 NaN 1.010000 NaN
11 2021-11-30 1.015 1.000000 NaNhttps://stackoverflow.com/questions/70153278
复制相似问题