我有一个非常奇怪的数据结构,这是一个元组列表。每个元组有五个元素,第一个元素是标识字符串,其他四个字符串是浮点数(奇怪的是,它们不仅仅是浮动的)。抱歉,我从其他人那里得到的数据。
我想对第一个指数相同的2-5个数字的所有数字进行平均数。示例:
[('ch', ' 0.8307', '0.8583', '0.8047', ' 0.969'),
('de', ' 0.721', '0.7529', '0.6917', ' 0.968'),
('en', ' 0.8441', '0.8732', '0.8168', ' 0.9569'),
('fn', ' 0.8207', '0.8574', '0.7870', ' 0.9609'),
('ch', ' 0.466', '0.572', '0.7733', ' 0.969'),
('de', ' 0.322', '0.385', '0.5431', ' 0.968'),
('sp', ' 0.7609', '0.7893', '0.7344', ' 0.9663'),
('ti', ' 0.8135', '0.8430', '0.7860', ' 0.9662')]输出应该收缩所有具有相同第一个索引的元素,并对其值进行平均值,因此类似于(我在这里的示例输出中没有对值进行平均值):
[('ch', ' 0.8307', '0.8583', '0.8047', ' 0.969'),
('de', ' 0.721', '0.7529', '0.6917', ' 0.968'),
('en', ' 0.8441', '0.8732', '0.8168', ' 0.9569'),
('fn', ' 0.8207', '0.8574', '0.7870', ' 0.9609'),
('sp', ' 0.7609', '0.7893', '0.7344', ' 0.9663'),
('ti', ' 0.8135', '0.8430', '0.7860', ' 0.9662')]有什么非常聪明的地方我可以做,而不是做一个巨大的循环提取全部吗?
发布于 2020-11-10 13:28:01
您可以首先创建一个dict来收集与每个id相关的所有值,然后计算以下方法:
from collections import defaultdict
data = [('ch', ' 0.8307', '0.8583', '0.8047', ' 0.969'),
('de', ' 0.721', '0.7529', '0.6917', ' 0.968'),
('en', ' 0.8441', '0.8732', '0.8168', ' 0.9569'),
('fn', ' 0.8207', '0.8574', '0.7870', ' 0.9609'),
('ch', ' 0.466', '0.572', '0.7733', ' 0.969'),
('de', ' 0.322', '0.385', '0.5431', ' 0.968'),
('sp', ' 0.7609', '0.7893', '0.7344', ' 0.9663'),
('ti', ' 0.8135', '0.8430', '0.7860', ' 0.9662')]
def mean(lst):
return sum(lst)/len(lst)
d = defaultdict(list)
for id, *values in data:
d[id].append(list(map(float, values)))
out = {id: [mean(column) for column in zip(*values)] for id, values in d.items() }
print(out)
# {'ch': [0.64835, 0.71515, 0.7889999999999999, 0.969],
# 'de': [0.5215, 0.5689500000000001, 0.6174, 0.968],
# 'en': [0.8441, 0.8732, 0.8168, 0.9569],
# 'fn': [0.8207, 0.8574, 0.787, 0.9609],
# 'sp': [0.7609, 0.7893, 0.7344, 0.9663],
# 'ti': [0.8135, 0.843, 0.786, 0.9662]}在for id, *values in data:中,我们迭代data的元组,并将元组的第一项放在id中,其余的值放在values中。
此外,使用defaultdict(list)可以简单地为每个键追加新的值列表,因为如果列表还不存在,则会自动创建一个空列表。
发布于 2020-11-10 13:36:09
一些数字上的争论:
。
data = [('ch', ' 0.8307', '0.8583', '0.8047', ' 0.969'),
('de', ' 0.721', '0.7529', '0.6917', ' 0.968'),
('en', ' 0.8441', '0.8732', '0.8168', ' 0.9569'),
('fn', ' 0.8207', '0.8574', '0.7870', ' 0.9609'),
('ch', ' 0.466', '0.572', '0.7733', ' 0.969'),
('de', ' 0.322', '0.385', '0.5431', ' 0.968'),
('sp', ' 0.7609', '0.7893', '0.7344', ' 0.9663'),
('ti', ' 0.8135', '0.8430', '0.7860', ' 0.9662')]
from pprint import pprint
from collections import defaultdict
d = defaultdict(list)
for t in data:
d[t[0]].append(list(map(float, t[1:])))
pprint(d)
for key, values in d.items():
w = len(values)
if w > 1:
d[key] = [sum(numbers) / w for numbers in zip(*values)]
else:
d[key] = d[key][0]
pprint(d)输出:
# after converting to float and collecting into lists
defaultdict(<class 'list'>,
{'ch': [[0.8307, 0.8583, 0.8047, 0.969],
[0.466, 0.572, 0.7733, 0.969]],
'de': [[0.721, 0.7529, 0.6917, 0.968],
[0.322, 0.385, 0.5431, 0.968]],
'en': [[0.8441, 0.8732, 0.8168, 0.9569]],
'fn': [[0.8207, 0.8574, 0.787, 0.9609]],
'sp': [[0.7609, 0.7893, 0.7344, 0.9663]],
'ti': [[0.8135, 0.843, 0.786, 0.9662]]})
# after averaging
defaultdict(<class 'list'>,
{'ch': [0.64835, 0.71515, 0.7889999999999999, 0.969],
'de': [0.5215, 0.5689500000000001, 0.6174, 0.968],
'en': [0.8441, 0.8732, 0.8168, 0.9569],
'fn': [0.8207, 0.8574, 0.787, 0.9609],
'sp': [0.7609, 0.7893, 0.7344, 0.9663],
'ti': [0.8135, 0.843, 0.786, 0.9662]})发布于 2020-11-10 13:46:35
对于熊猫来说,这就更微不足道了:
data = [('ch', ' 0.8307', '0.8583', '0.8047', ' 0.969'),
('de', ' 0.721', '0.7529', '0.6917', ' 0.968'),
('en', ' 0.8441', '0.8732', '0.8168', ' 0.9569'),
('fn', ' 0.8207', '0.8574', '0.7870', ' 0.9609'),
('ch', ' 0.466', '0.572', '0.7733', ' 0.969'),
('de', ' 0.322', '0.385', '0.5431', ' 0.968'),
('sp', ' 0.7609', '0.7893', '0.7344', ' 0.9663'),
('ti', ' 0.8135', '0.8430', '0.7860', ' 0.9662')]
import pandas as pd
df = pd.DataFrame(data, dtype=float)
print(df.groupby(0).mean())输出:
1 2 3 4
0
ch 0.64835 0.71515 0.7890 0.9690 # pandas displays "nice" numbers,
de 0.52150 0.56895 0.6174 0.9680 # it contains the "correct" ones
en 0.84410 0.87320 0.8168 0.9569
fn 0.82070 0.85740 0.7870 0.9609
sp 0.76090 0.78930 0.7344 0.9663
ti 0.81350 0.84300 0.7860 0.9662https://stackoverflow.com/questions/64769721
复制相似问题