我正试图从json中取出键和值,以便在熊猫中分隔行。
我有:
|---------------------|------------------|
| session | scoring |
|---------------------|------------------|
| session1 | {id1:scoring1, |
| | id2:scoring2, |
| | id3:scoring3} |
|---------------------|------------------|
| session2 | {id4:scoring4, |
| | id5:scoring5} |
|---------------------|------------------|我想得到:
|---------------------|------------------|---------------------|------------------|
| session | scoring | id | score |
|---------------------|------------------|---------------------|------------------|
| session1 | {id1:scoring1, | id1 | score1 |
| | id2:scoring2, | | |
| | id3:scoring3} | | |
|---------------------|------------------|---------------------|------------------|
| session1 | {id1:scoring1, | id2 | score2 |
| | id2:scoring2, | | |
| | id3:scoring3} | | |
|---------------------|------------------|---------------------|------------------|
| session1 | {id1:scoring1, | id3 | score3 |
| | id2:scoring2, | | |
| | id3:scoring3} | | |
|---------------------|------------------|---------------------|------------------|
| session2 | {id4:scoring4, | id4 | score4 |
| | id5:scoring5} | | |
|---------------------|------------------|---------------------|------------------|
| session2 | {id4:scoring4, | id5 | score5 |
| | id5:scoring5} | | |
|---------------------|------------------|---------------------|------------------|我使用的代码:(迭代行和jsons,如果id首先在json中,然后将其放在相邻的单元格中,否则创建新行并追加到df中)
append_index = df.shape[0]
for index, row in df.iterrows():
append_now = False
for key, val in row['scoring'].items():
if append_now:
row['id'] = key
row['score'] = val
df.loc[append_index] = row
append_index += 1
else:
df.loc[index,'id'] = key
df.loc[index, 'score'] = val
append_now = True问题是,df由6+ mlm行组成,只迭代20行就需要半个小时。但是,当我限制前1k行时,它可以很好地工作,
发布于 2020-10-28 12:43:05
不确定这样做是否更好,但您可能需要尝试一下:
样本框架
data = [[{'id1': 'score1', 'id2': 'score2', 'id3': 'score3'}],
[{'id4': 'score4', 'id5': 'score5'}]]
df = pd.DataFrame(data, index=['session1', 'session2'])看起来像
0
session1 {'id1': 'score1', 'id2': 'score2', 'id3': 'score3'}
session2 {'id4': 'score4', 'id5': 'score5'}这
data_new = [[session, id, score]
for session, scores in zip(df.index, df[0])
for id, score in scores.items()]
df = pd.DataFrame(data_new)
df.set_index(0, inplace=True)复制你的结果
1 2
0
session1 id1 score1
session1 id2 score2
session1 id3 score3
session2 id4 score4
session2 id5 score5但可能表现得更好。
https://stackoverflow.com/questions/64570857
复制相似问题