我有一个包含多索引的数据帧。我需要基于模式和/或脚本处理数据的各种子集(索引是schema和script)。数据帧如下所示:
tx_id step step_id start_time
schema_10 cmc_v2_file 19-3 10 279 2015-09-04 00:46:30
cmc_v2_file 2-7 10 423 2015-09-04 00:46:22
cmc_v2_file 29-1 10 20 2015-09-04 00:46:34
cmc_v2_file 35-1 4 63 2015-09-04 00:46:51
cmc_v2_file 31-2 10 79 2015-09-04 00:46:54
cmc_v2_file 5-8 10 536 2015-09-04 00:46:57
cmc_v2_file 5-9 10 610 2015-09-04 00:47:13
cmc_v2_file 39-1 10 178 2015-09-04 00:47:12
cmc_v2_file 41-1 10 211 2015-09-04 00:47:22
cmc_v2_file 21-4 10 678 2015-09-04 00:47:28
cmc_v2_file 23-4 10 698 2015-09-04 00:47:31
cmc_v2_file 31-5 10 399 2015-09-04 00:47:45
cmc_v2_file 35-4 3 453 2015-09-04 00:47:54
cmc_v2_file 29-5 4 461 2015-09-04 00:47:54
cmc_v2_file 29-5 8 465 2015-09-04 00:47:55
cmc_v2_file 42-3 1 467 2015-09-04 00:47:57
cmc_v2_file 22-5 8 866 2015-09-04 00:47:53
cmc_v2_file 16-6 8 893 2015-09-04 00:47:51
cmc_v2_file 17-6 4 938 2015-09-04 00:47:54
cmc_v2_file 17-6 8 942 2015-09-04 00:47:55
cmc_v2_file 6-2 10 707 2015-09-04 00:47:50
cmc_v2_file 4-11 10 730 2015-09-04 00:47:54
cmc_v2_file 6-3 2 745 2015-09-04 00:47:53
cmc_v2_file 5-11 1 762 2015-09-04 00:47:55
cmc_v2_file 4-12 1 763 2015-09-04 00:47:56
cmc_v2_file 5-12 10 782 2015-09-04 00:48:16
cmc_v2_file 31-6 4 471 2015-09-04 00:47:55
cmc_v2_file 38-3 4 520 2015-09-04 00:47:51
cmc_v2_file 39-3 4 551 2015-09-04 00:47:55
cmc_v2_file 31-7 10 570 2015-09-04 00:48:20
... ... ... ... ...
schema_9 hcs-vbu 1332-132 14 197542 2015-09-04 00:29:46
hcs-vbu 515-143 5 196309 2015-09-04 00:29:01
hcs-vbu 552-126 13 196333 2015-09-04 00:29:19
hcs-vbu 559-116 12 197068 2015-09-04 00:29:33
hcs-vbu 566-115 13 197201 2015-09-04 00:29:47
hcs-vbu 523-152 3 197443 2015-09-04 00:29:33
hcs-vbu 790-136 2 200774 2015-09-04 00:28:46
hcs-vbu 790-136 4 200776 2015-09-04 00:28:56
hcs-vbu 790-136 12 200784 2015-09-04 00:29:13
hcs-vbu 206-148 5 198213 2015-09-04 00:29:04 为了获得特定脚本的数据,我这样做:
df.loc(axis=0)[:,[script]]当我打印出整个数据帧时,它看起来是正确的。问题是,我还为所有这些内容编写了一个单元测试,并且对于测试的一部分,我想验证数据是否只包含一个脚本:
scripts = df.index.levels[df.index.names.index('script')]然而,我并没有像我期望的那样返回一个包含1的列表,而是得到了一个包含6个脚本的列表,这是原始未过滤数据中的脚本数量。通过调用.loc过滤数据帧后,是否有其他方法可以检索脚本索引?
发布于 2015-10-21 03:45:17
第二个语句df.index.levels获取索引中的所有级别。然后,您可以通过说,给我第二个多重索引(称为'script')中的所有级别。
我认为你想要的东西是这样的,你说,对于名为'script‘的索引,给我一个特定值。
## here we set a specific value you want to filter with
specific_script_value = cmc_v2_file
## and then we filter in the second dimension of the index.
## The indexer helps slice in several dimensions
idx=pd.IndexSlice
df.loc[idx[:,specific_script_value],:]https://stackoverflow.com/questions/33244276
复制相似问题