我有一个字典列表,我想按日期分组,但在此之前,我需要将度量和水平的值合并成一个新的键(例如rmse-1,rmse-2,作为值1、2的键,等等)。见所需输出。
import pandas as pd
import datetime
dataset = [{'date': datetime.datetime(2022, 9, 10, tzinfo=datetime.timezone.utc),
'extra': 0,
'horizon': 1,
'metric': 'rmse',
'value': 1
},
{'date': datetime.datetime(2022, 9, 11, tzinfo=datetime.timezone.utc),
'extra': 0,
'horizon': 1,
'metric': 'rmse',
'value': 2
},
{'date': datetime.datetime(2022, 9, 10, tzinfo=datetime.timezone.utc),
'extra': 0,
'horizon': 2,
'metric': 'rmse',
'value': 3
},
{'date': datetime.datetime(2021, 9, 11, tzinfo=datetime.timezone.utc),
'extra': 0,
'horizon': 2,
'metric': 'rmse',
'value': 4
}
]这是所需的输出:
desired_output= [{'date': datetime.datetime(2021, 9, 10, tzinfo=datetime.timezone.utc),
'rmse-1': 1,
'rmse-2': 3,
},
{'date': datetime.datetime(2021, 9, 11, tzinfo=datetime.timezone.utc),
'rmse-1': 2,
'rmse-2': 4,
}
]这是一个部分的解决方案。它需要按日期分组,但每一行都有不同的键.
def group_and_merge_dataset(dataset, group_by_key, merge_value_keys):
new_dataset = []
for item in dataset:
item.update({'metric': "{}-{}".format(item['metric'],item['horizon'])})
d = {'date': item[group_by_key], item['metric']: item['value']}
new_dataset.append(d)
for item in new_dataset:
print(item)
print(group_and_merge_dataset(dataset, 'date', ['metric', 'horizon']))产出:
{'date': datetime.datetime(2022, 9, 10, 0, 0, tzinfo=datetime.timezone.utc), 'rmse-1': 1}
{'date': datetime.datetime(2022, 9, 11, 0, 0, tzinfo=datetime.timezone.utc), 'rmse-1': 2}
{'date': datetime.datetime(2022, 9, 10, 0, 0, tzinfo=datetime.timezone.utc), 'rmse-2': 3}
{'date': datetime.datetime(2021, 9, 11, 0, 0, tzinfo=datetime.timezone.utc), 'rmse-2': 4}
None发布于 2022-10-07 14:32:43
以下可能不是最终的解决方案,但将有助于指导您实现目标。
import pandas as pd
import datetime
dataset = [{'date': datetime.datetime(2022, 9, 10, tzinfo=datetime.timezone.utc),
'extra': 0,
'horizon': 1,
'metric': 'rmse',
'value': 1
},
{'date': datetime.datetime(2022, 9, 11, tzinfo=datetime.timezone.utc),
'extra': 0,
'horizon': 1,
'metric': 'rmse',
'value': 2
},
{'date': datetime.datetime(2022, 9, 10, tzinfo=datetime.timezone.utc),
'extra': 0,
'horizon': 2,
'metric': 'rmse',
'value': 3
},
{'date': datetime.datetime(2022, 9, 11, tzinfo=datetime.timezone.utc),
'extra': 0,
'horizon': 2,
'metric': 'rmse',
'value': 4
}
]
df = pd.DataFrame(data=dataset)
>> date extra horizon metric value
>> 0 2022-09-10 00:00:00+00:00 0 1 rmse 1
>> 1 2022-09-11 00:00:00+00:00 0 1 rmse 2
>> 2 2022-09-10 00:00:00+00:00 0 2 rmse 3
>> 3 2022-09-11 00:00:00+00:00 0 2 rmse 4加入metric和horizon值:
df["metric_horizon"] = df["metric"] + "-" + df["horizon"].astype(str)
>> date extra horizon metric value metric_horizon
>> 0 2022-09-10 00:00:00+00:00 0 1 rmse 1 rmse-1
>> 1 2022-09-11 00:00:00+00:00 0 1 rmse 2 rmse-1
>> 2 2022-09-10 00:00:00+00:00 0 2 rmse 3 rmse-2
>> 3 2022-09-11 00:00:00+00:00 0 2 rmse 4 rmse-2枢轴metric_horizon将值连接到列:
df2 = df.pivot_table(index=["date"], columns="metric_horizon", values="value", aggfunc="first").reset_index()
>> metric_horizon date rmse-1 rmse-2
>> 0 2022-09-10 00:00:00+00:00 1 3
>> 1 2022-09-11 00:00:00+00:00 2 4它几乎是你想要的格式。您现在可以根据需要将dataframe df2转换为字典。
https://stackoverflow.com/questions/73987990
复制相似问题