假设我有表单的数据:
>>> df = pd.DataFrame([['2012', 'A', 1], ['2012', 'B', 2], ['2011', 'A', 3],
['2011', 'B', 2]],
columns=['branch_year', 'branch_name', 'employee_id'])
branch_year branch_name employee_id
0 2012 A 1
1 2012 B 2
2 2011 A 3
3 2011 B 2我如何将列branch_year和branch_name组合起来,使它们有一个父列branch --并且理想地将它们重命名以去掉branch_前缀。
branch branch employee_id
year name
0 2012 A 1
1 2012 B 2
2 2011 A 3
3 2011 B 2最终目标是创建表单的字典列表:
[
{
"employeed_id": 1,
"branch": {
"name": "A",
"year": "2012"
}
},
{...}
]发布于 2016-02-15 22:20:03
您可以对每一行应用一个函数,并将结果转换为列表:
def to_nested_dict(row):
return {'employee_id': row.employee_id,
'branch': {'year': row.branch_year, 'name': row.branch_name}}
list(df.apply(to_nested_dict, axis=1))这保留了行的原始顺序:
[{'branch': {'name': 'A', 'year': '2012'}, 'employee_id': 1},
{'branch': {'name': 'B', 'year': '2012'}, 'employee_id': 2},
{'branch': {'name': 'A', 'year': '2011'}, 'employee_id': 3},
{'branch': {'name': 'B', 'year': '2011'}, 'employee_id': 2}]嵌套在具有下划线的列名上的编程方法:
def to_nested_dict(row):
res = {}
for col in row.index:
outer_key, inner_key = col.split('_')
outer = res.setdefault(outer_key, {})
outer[inner_key] = row[col]
return res
list(df.apply(to_nested_dict, axis=1))结果:
[{'branch': {'name': 'A', 'year': '2012'}, 'employee': {'id': 1}},
{'branch': {'name': 'B', 'year': '2012'}, 'employee': {'id': 2}},
{'branch': {'name': 'A', 'year': '2011'}, 'employee': {'id': 3}},
{'branch': {'name': 'B', 'year': '2011'}, 'employee': {'id': 2}}]发布于 2016-02-15 16:27:49
不是很漂亮,但使用groupby就能得到你想要的东西
lst = []
for k,g in pd.groupby(df, by=['branch_name','branch_year']):
d = {'employee_id': int(g['employee_id']), 'branch': {'name': k[0], 'year': k[1]}}
lst.append(d)
lst
[{'branch': {'name': 'A', 'year': '2011'}, 'employee_id': 3},
{'branch': {'name': 'A', 'year': '2012'}, 'employee_id': 1},
{'branch': {'name': 'B', 'year': '2011'}, 'employee_id': 2},
{'branch': {'name': 'B', 'year': '2012'}, 'employee_id': 2}]发布于 2016-02-15 22:59:49
我的尝试是一种编程方式(假设您可以使用"_"):
hierarchy = [original.split('_') for original in df.columns]
def to_nested_dict(row):
d = defaultdict(dict)
for keys, field in zip(hierarchy, row.index):
val = getattr(row, field)
if len(keys) == 1:
d[keys[0]] = val
elif len(a) == 2:
d[keys[0]][keys[1]] = val
return d
list(df.apply(to_nested_dict, axis=1))https://stackoverflow.com/questions/35413776
复制相似问题