我的csv输入文件有时在日期字段中有excel序列号。我使用以下代码,因为我的输入文件不应该包含01/2000之前的日期。然而,这个解决方案很费时,我希望能找到一个更好的解决方案。谢谢。
def DateCorrection(x):
if pd.to_datetime(x) < pd.to_datetime('2000-01-01'):
return pd.to_datetime(datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2))
else:
return pd.to_datetime(x)发布于 2021-01-01 06:15:40
假设您的输入看起来像
import pandas as pd
df = pd.DataFrame({'date': ["2020-01-01", 43862, "2020-03-01"]})您可以按以下方式处理它:
# convert everything first, ignore invalid results for now:
df['datetime'] = pd.to_datetime(df['date'])
# where you have numeric values, i.e. "excel datetime format":
nums = pd.to_numeric(df['date'], errors='coerce') # timestamp strings will give NaN here
# now replace the invalid dates:
df.loc[nums.notna(), 'datetime'] = pd.to_datetime(nums[nums.notna()], unit='d', origin='1899-12-30')...giving你
df
date datetime
0 2020-01-01 2020-01-01
1 43862 2020-02-01
2 2020-03-01 2020-03-01有关:
https://stackoverflow.com/questions/65514678
复制相似问题