我有一个样本数据集:
import pandas as pd
d = {
'ID': ['ID-1','ID-1','ID-1','ID-1','ID-2','ID-2','ID-2'],
'OBR':[100,100,100,100,200,200,200],
'OBX':['A','B','C','D','A','B','C'],
'notes':['hello','hello2','','','bye','',''],
}
df = pd.DataFrame(d)看上去:
ID OBR OBX notes
ID-1 100 A hello
ID-1 100 B hello2
ID-1 100 C
ID-1 100 D
ID-2 200 A bye
ID-2 200 B
ID-2 200 C 我想循环遍历每一行,对于每个ID、OBR组合,给OBX分配一个数字,并记下这个增量1的名称,并相应地分配值。
因此,对于第一个ID,OBR组合体:ID和OBR名称保持不变,因为有4个不同的OBX值,OBX的名称将是OBX1、OBX2、OBX3和OBX4,而且由于有两个不同的注释值,所以注释的名称将是note1和note2。
第二个ID,OBR组合体:ID和OBR名称保持不变,因为有3个不同的OBX值,OBX的名称将是OBX1、OBX2和OBX3,而且由于有一个notes值,所以便笺的名称将是note1。
愿望输出:打印和赋值
ID = ID-1
OBR= 100
OBX1=A
OBX2=B
OBX3=C
OBX4=D
note1 = hello
note2 = hello2
ID = ID-2
OBR= 200
OBX1 = A
OBX2 = B
OBX3 = C
note1 = bye我的尝试:
count = 0
grouped = df.groupby(['ID','OBR'])
for a, group in grouped:
ID = a[0]
OBR = a[1]
OBX+str(count) = group['OBX'] #this gives an error, can't use OBX+str(count) as the name
note+str(count) = group['notes'] #this gives an error as well
count +=1 #Is using count correct?
print(....)发布于 2018-05-31 16:07:50
一种方法是对元组进行groupby:
res = df.groupby(['ID', 'OBR'])\
.agg({'OBX': lambda x: tuple(x), 'notes': lambda x: tuple(filter(None, x))})\
.reset_index()
print(res)
ID OBR OBX notes
0 ID-1 100 (A, B, C, D) (hello, hello2)
1 ID-2 200 (A, B, C) (bye,)然后在适用的情况下使用enumerate迭代行:
for row in res.itertuples():
print('\nID =', row.ID)
print('OBR =', row.OBR)
for i, obx in enumerate(row.OBX, 1):
print('OBX'+str(i)+' =', obx)
for i, note in enumerate(row.notes, 1):
print('notes'+str(i)+' =', note)结果:
ID = ID-1
OBR = 100
OBX1 = A
OBX2 = B
OBX3 = C
OBX4 = D
notes1 = hello
notes2 = hello2
ID = ID-2
OBR = 200
OBX1 = A
OBX2 = B
OBX3 = C
notes1 = byehttps://stackoverflow.com/questions/50628327
复制相似问题