我有一个数据(大约50k行和150个列)与能源和天气变量数据来自不同的城市。
我想把数据分割成5个数据(每个城市都有一个数据)。
整个dataframe基本上是这样构造的。
df = pd.DataFrame({'Weather':[4,5,4,5,5,4],
'Energy':[7,8,9,4,2,3],
'Weather_city1':[1,3,5,7,1,0],
'Energy_city1':[7,4,7,2,1,0],
'Weather_city2':[1,0,6,2,6,9],
'Energy_city2':[6,1,5,3,2,7]}
)
print (df)
Weather Energy Weather_city1 Energy_city1 ...
0 4 4 7 1
1 5 5 8 3
2 4 4 9 5
3 5 5 4 7
4 5 5 2 1
5 4 4 3 0如何将其分解为更多的数据(一个用于每个城市,值仅用于city1,一个用于city2等等)?
发布于 2022-04-13 12:00:47
IIUC,你可以:
# columns without city id
cols = ['Weather', 'Energy']
groups = df.drop(columns=cols).columns.str.extract('(?<=_)(.*)$', expand=False)
[g.reset_index() for _, g in df.set_index(cols).groupby(groups, axis=1)]产出:
[ Weather Energy Weather_city1 Energy_city1
0 4 7 1 7
1 5 8 3 4
2 4 9 5 7
3 5 4 7 2
4 5 2 1 1
5 4 3 0 0,
Weather Energy Weather_city2 Energy_city2
0 4 7 1 6
1 5 8 0 1
2 4 9 6 5
3 5 4 2 3
4 5 2 6 2
5 4 3 9 7]作为字典:
{name: g.reset_index()
for name, g in df.set_index(['Weather', 'Energy']).groupby(groups, axis=1)}产出:
{'city1': Weather Energy Weather_city1 Energy_city1
0 4 7 1 7
1 5 8 3 4
2 4 9 5 7
3 5 4 7 2
4 5 2 1 1
5 4 3 0 0,
'city2': Weather Energy Weather_city2 Energy_city2
0 4 7 1 6
1 5 8 0 1
2 4 9 6 5
3 5 4 2 3
4 5 2 6 2
5 4 3 9 7}发布于 2022-04-13 12:25:34
我会直接转换原始数据
import pandas as pd
data = {'Weather_city1':[1,3,5,7,1,0],
'Energy_city1':[7,4,7,2,1,0],
'Weather_city2':[1,0,6,2,6,9],
'Energy_city2':[6,1,5,3,2,7]}# get the list of unique city
cities = set([elem.split("_")[1] for elem in data.keys()]) import numpy as np
city_data = {}
for city in cities:
city_data[city] = {"Weather": data[f"Weather_{city}"], "Energy": data[f"Energy_{city}"]}city_data{'city1': {'Weather': [1, 3, 5, 7, 1, 0], 'Energy': [7, 4, 7, 2, 1, 0]},
'city2': {'Weather': [1, 0, 6, 2, 6, 9], 'Energy': [6, 1, 5, 3, 2, 7]}}然后你就可以和熊猫玩了
cities_dataframes = {city: pd.DataFrame(city_data[city]) for city in cities}cities_dataframes['city1']
# Weather Energy
# 0 1 7
# 1 3 4
# 2 5 7
# 3 7 2
# 4 1 1
# 5 0 0https://stackoverflow.com/questions/71856912
复制相似问题