让我们想象一下下面的DataFrame
import pandas as pd
df = pd.DataFrame({
"id": [1, 1, 1],
"date": ["2020-1-1", "2020-1-4", "2020-2-3"],
"value": [1, 2, 3]
})
df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")
# This is
# id date value
# 0 1 2020-01-01 1
# 1 1 2020-01-04 2
# 2 1 2020-02-03 3现在,以date列作为索引,我可以索引一个月,如下所示:
df.set_index("date")["2020-01"]
# Output:
# id value
# date
# 2020-01-01 1 1
# 2020-01-04 1 2但是有了MultiIndex,我就无法使用这个特性了。我试过使用切片,比如:
df.set_index(["id", "date"])[:, "2020-01"]这会引发一个
TypeError: '(slice(None, None, None), '2020-01')' is an invalid key有干净的方法吗?
发布于 2020-04-14 09:42:27
这是可能的,但需要IndexSlice,因为DatetimeIndex在第二级:
idx = pd.IndexSlice
print (df.set_index(["id", "date"]).loc[idx[:, '2020-01'], :])
value
id date
1 2020-01-01 1
2020-01-04 2如果DatetimeIndex在第一级:
print (df.set_index(["date", "id"]).loc['2020-01'])
value
date id
2020-01-01 1 1
2020-01-04 1 2DatetimeIndex部分字符串索引也适用于带有MultiIndex的DataFrame。
In [111]: dft2 = pd.DataFrame(np.random.randn(20, 1),
.....: columns=['A'],
.....: index=pd.MultiIndex.from_product(
.....: [pd.date_range('20130101', periods=10, freq='12H'),
.....: ['a', 'b']]))
.....: In [112]: dft2
Out[112]:
A
2013-01-01 00:00:00 a -0.298694
b 0.823553
2013-01-01 12:00:00 a 0.943285
b -1.479399
2013-01-02 00:00:00 a -1.643342
... ...
2013-01-04 12:00:00 b 0.069036
2013-01-05 00:00:00 a 0.122297
b 1.422060
2013-01-05 12:00:00 a 0.370079
b 1.016331
[20 rows x 1 columns]
In [113]: dft2.loc['2013-01-05']
Out[113]:
A
2013-01-05 00:00:00 a 0.122297
b 1.422060
2013-01-05 12:00:00 a 0.370079
b 1.016331In [114]: idx = pd.IndexSlice
In [115]: dft2 = dft2.swaplevel(0, 1).sort_index()
In [116]: dft2.loc[idx[:, '2013-01-05'], :]
Out[116]:
A
a 2013-01-05 00:00:00 0.122297
2013-01-05 12:00:00 0.370079
b 2013-01-05 00:00:00 1.422060
2013-01-05 12:00:00 1.016331发布于 2020-04-14 09:49:39
要完成您想要做的事情,一种方法是将axis参数指定给.loc,以解释在单个轴上传递的切片器。
df.set_index(["id", "date"]).loc(axis=0)[:, "2020-01"]https://stackoverflow.com/questions/61204695
复制相似问题