我在python中有一个列表,如下所示。我想把它变成一个数据框架。我尝试过这样做:pd.DataFrame(myList),然而‘’列存储了一个列表,但是我想将原点和quantityLeads键存储在相同的数据格式中
myList = [
{
"id":3105052,
"title":"Ebook Relat�rios Gerenciais",
"offering":"Institucional",
"created_date":"2022-06-28"
"inserted_date":"2022-06-28",
"channel":"Social",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"LinkedIn",
"quantityLeads":"1"
},
{
"origin":"Facebook",
"quantityLeads":"1"
}
]
},
{
"id":3105052,
"title":"Ebook Relat�rios Gerenciais",
"offering":"Institucional",
"inserted_date":"2022-06-28",
"created_date":"2022-06-28",
"channel":"Direct",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"Desconhecida",
"quantityLeads":"2"
}
]
},
{
"id":2918513,
"title":"Ebook Direct To Consumer",
"offering":"Supply Chain",
"created_date":"2022-06-28",
"inserted_date":"2022-06-28",
"channel":"Social",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"LinkedIn",
"quantityLeads":"1"
}
]
}
]发布于 2022-06-29 21:26:12
在追求简单的过程中,您可以使用如下内容来简化字典结构:
for row in myList:
row["origin"] = row["origins"][0]["origin"]
row["quantityLeads"] = row["origins"][0]["quantityLeads"]
del row["origins"]
df = pd.DataFrame(myList)
print(df)输出:
id title offering created_date inserted_date channel start_date end_date origin quantityLeads
0 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
1 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28 2022-06-28 Direct 2022-06-28 2022-06-28 Desconhecida 2
2 2918513 Ebook Direct To Consumer Supply Chain 2022-06-28 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1顺便提一下,对于上面的myList示例,在第一个条目的created_date之后缺少一个逗号,这会导致一个错误。
编辑:如果源列表中有可变数量的项目,但是每个项目都有相同的键,那么我们也可以迭代这些项。
for row in myList:
origins_list = row["origins"]
counter = 0
for item in origins_list:
row["origin_" + str(counter)] = item["origin"]
row["quantityLeads_" + str(counter)] = item["quantityLeads"]
counter += 1
del row["origins"]
df = pd.DataFrame(myList)
print(df)发布于 2022-06-29 21:42:53
如果“源”中有多个元素,您可能会首先爆炸,创建“原产地”、"quantityLeads“,然后决定如何处理其余的数据。
df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)打印(Df):
id title offering created_date \
0 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28
1 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28
2 2918513 Ebook Direct To Consumer Supply Chain 2022-06-28
inserted_date channel start_date end_date origin quantityLeads
0 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
1 2022-06-28 Direct 2022-06-28 2022-06-28 Desconhecida 2
2 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1 发布于 2022-07-07 16:25:37
对我来说很管用。
df = pd.DataFrame(myList)
df = df.explode('origins')
df['origin'] = df.origins.str.get('origin')
df['quantityLeads'] = df.origins.str.get('quantityLeads')
df.drop('origins', axis=1, inplace=True)https://stackoverflow.com/questions/72807824
复制相似问题