我有一个数据格式,以便:
... Hom ... March Plans March Ships April Plans April Ships ...
0 CAD ... 12 5 4 13
1 USA ... 7 6 2 11
2 CAD ... 4 9 6 14
3 CAD ... 13 3 9 7
... ... ... ... ... ... ...一年中所有的月份。我希望它是:
... Hom ... Month Plans Ships ...
0 CAD ... March 12 5
1 USA ... March 7 6
2 CAD ... March 4 9
3 CAD ... March 13 3
4 CAD ... April 4 13
5 USA ... April 2 11
6 CAD ... April 6 14
7 CAD ... April 9 7
... ... ... ... ... ...有什么简单的方法可以做到这一点而不分裂字符串条目?我曾经使用过totaldf.unstack(),但是由于有多个列,所以我不确定如何正确地重新索引数据。
发布于 2019-02-21 19:35:54
您可以使用pd.wide_to_long,只需做一些额外的工作就可以得到正确的stubnames,就像在文档中提到的那样:
存根名。假定宽格式变量以存根名称开头。
因此,有必要稍微修改列名,使其位于每个列名的开头:
m = df.columns.str.contains('Plans|Ships')
cols = df.columns[m].str.split(' ')
df.columns.values[m] = [w+month for month, w in cols]
print(df)
Hom PlansMarch ShipsMarch PlansApril ShipsApril
0 CAD 12 5 4 13
1 USA 7 6 2 11
2 CAD 4 9 6 14
3 CAD 13 3 9 7现在,您可以使用pd.wide_to_long使用['Ships', 'Plans']作为顽固名称,以获得所需的输出:
((pd.wide_to_long(df.reset_index(), stubnames=['Ships', 'Plans'], i = 'index',
j = 'Month', suffix='\w+')).reset_index(drop=True, level=0)
.reset_index())
x Month Hom Ships Plans
0 March CAD 5 12
1 March USA 6 7
2 March CAD 9 4
3 March CAD 3 13
4 April CAD 13 4
5 April USA 11 2
6 April CAD 14 6
7 April CAD 7 9发布于 2019-02-21 19:33:41
如果将列转换为MultiIndex,则可以使用堆栈:
In [11]: df1 = df.set_index("Hom")
In [12]: df1.columns = pd.MultiIndex.from_tuples(df1.columns.map(lambda x: tuple(x.split())))
In [13]: df1
Out[13]:
March April
Plans Ships Plans Ships
Hom
CAD 12 5 4 13
USA 7 6 2 11
CAD 4 9 6 14
CAD 13 3 9 7
In [14]: df1.stack(level=0)
Out[14]:
Plans Ships
Hom
CAD April 4 13
March 12 5
USA April 2 11
March 7 6
CAD April 6 14
March 4 9
April 9 7
March 13 3In [21]: res = df1.stack(level=0)
In [22]: res.index.names = ["Hom", "Month"]
In [23]: res.reset_index()
Out[23]:
Hom Month Plans Ships
0 CAD April 4 13
1 CAD March 12 5
2 USA April 2 11
3 USA March 7 6
4 CAD April 6 14
5 CAD March 4 9
6 CAD April 9 7
7 CAD March 13 3https://stackoverflow.com/questions/54814462
复制相似问题