我有一个csv文件,在一个单元格中有多个值,格式如下:
ID, Name, Role, Task, Responsibility
123, Stephen, "1. Give, 2. Take", "1.1. DO, 1.2. AB, 2.1. DF", "1.1.1. FG, 1.1.2. GH, 1.2.1. SG, 2.1.1. DF, 2.1.2. JK"为了提高可读性,我添加了一些空格。我需要将这个csv文件转换为嵌套的json格式,如下所示:
{
"Name" : "Stephen",
"123": {
"1": {
"Role": "Give",
"1.1": {
"Task": "DO",
"1.1.1": {
"Responsibility": "FG"
},
"1.1.2": {
"Responsibility": "GH"
}
},
"1.2": {
"Task": "AB",
"1.2.1": {
"Responsibility": "SG"
}
}
},
"2": {
"Role": "Take",
"2.1": {
"Task": "DF",
"2.1.1": {
"Responsibility": "DF"
},
"2.1.2": {
"Responsibility": "JK"
}
}
}
}
}数字是这样的1,1.1,1.2.1,2.2,2.3,2.3.1。我需要一个来检测这样的单元格(或这样的列类型),并将其转换为上面这样的key:value对。
发布于 2020-01-30 12:42:39
您可以对itertools.groupby使用递归
from itertools import groupby as gb
def to_dict(data):
d = [(a, list(b)) for a,b in gb(sorted(data, key=lambda x:x[0][0]), key=lambda x:x[0][0])]
return {b[0][1]:{**b[0][-1], **to_dict([[j, k, l] for [_, *j], k, l in b if j])} for a,b in d}import re, json
s = """
ID, Name, Role, Task, Responsibility
123, Stephen, "1. Give, 2. Take", "1.1. DO, 1.2. AB, 2.1. DF", "1.1.1. FG, 1.1.2. GH, 1.2.1. SG, 2.1.1. DF, 2.1.2. JK"
"""
#below: parse desired values from data and format header
[h1, h2, *h], [_id, n, *_data] = [re.findall('(?<=")[^"]+|\w+', i) for i in filter(None, s.split('\n'))]
#transform numerical paths as lists
data = [[b.split('. ') for b in i.split(', ')] for i in _data if i != ', ']
#associate original file headers to the transformed data
formed = [l for a, b in zip(h, data) for l in [[c.split('.'), c, {a:d}] for c, d in b]]
print(json.dumps({h2:n, h1:to_dict(formed)}, indent=4)) 输出:
{
"Name": "Stephen",
"ID": {
"1": {
"Role": "Give",
"1.1": {
"Task": "DO",
"1.1.1": {
"Responsibility": "FG"
},
"1.1.2": {
"Responsibility": "GH"
}
},
"1.2": {
"Task": "AB",
"1.2.1": {
"Responsibility": "SG"
}
}
},
"2": {
"Role": "Take",
"2.1": {
"Task": "DF",
"2.1.1": {
"Responsibility": "DF"
},
"2.1.2": {
"Responsibility": "JK"
}
}
}
}
}https://stackoverflow.com/questions/59979008
复制相似问题