我想提取一些特定日期范围内的数据,所以我使用pandas。
下面是一个数据帧示例:
1/18/2021 3000000 ...
1/18/2021 5000000 ...
1/18/2021 900 ...
1/18/2021 2000000 ...
1/18/2021 2000000 ...
12/13/2020 2910000 ... 下面是代码:
def date(start_time,end_time):
col_names = ['time', 'amount', 'category', 'subcategory', 'resunit', 'relateunit','divtype','des']
df = pd.read_csv('DATAss_notdivided.csv', skiprows=1, names=col_names)
df = df.set_index(['time'])
df = df.sort_index()
df=df.loc[start_time:end_time]
print(df)
date('2018-10-10','2200-10-10')但是我得到了这个输出:
Empty DataFrame
Columns: [amount, category, subcategory, resunit, relateunit, divtype, des]
Index: []我在这里做错了什么?注意:我使用了不同的日期格式作为输入,但它们都不起作用
发布于 2021-01-19 14:19:25
您需要DatetimeIndex,因此:
df = pd.read_csv('DATAss_notdivided.csv', skiprows=1, names=col_names)
df = df.set_index(['time'])使用:
df = pd.read_csv('DATAss_notdivided.csv',
skiprows=1,
names=col_names,
index_col=['time'],
parse_dates=['time'])另一种想法是,如果可能,某些日期时间是无效的:
df = pd.read_csv('DATAss_notdivided.csv', skiprows=1, names=col_names)
df['time'] = pd.to_datetime(df['time'], errors='coerce')
df = df.set_index(['time'])如果需要一些大的时间戳,年份2200是有效的,因为timestamp limitations是:
In [93]: pd.Timestamp.max
Out[93]: Timestamp('2262-04-11 23:47:16.854775807')总而言之:
def date(start_time,end_time):
col_names = ['time', 'amount', 'category', 'subcategory', 'resunit',
'relateunit','divtype','des']
df = pd.read_csv('DATAss_notdivided.csv',
skiprows=1,
names=col_names,
index_col=['time'],
parse_dates=['time'])
df = df.sort_index()
df=df.loc[start_time:end_time]
print(df)
date('2018-10-10','2200-10-10')发布于 2021-01-19 14:20:33
这可能是因为您没有使用datetimeindex。另外,您提供的结束日期不是2020,而是2200
https://stackoverflow.com/questions/65786361
复制相似问题