我有一个数据格式的字典dict,如:
{
‘table_1’: name color type
Banana Yellow Fruit,
‘another_table_1’: city state country
Atlanta Georgia United States,
‘and_another_table_1’: firstname middlename lastname
John Patrick Snow,
‘table_2’: name color type
Red Apple Fruit,
‘another_table_2’: city state country
Arlington Virginia United States,
‘and_another_table_2’: firstname middlename lastname
Alex Justin Brown,
‘table_3’: name color type
Lettuce Green Vegetable,
‘another_table_3’: city state country
Dallas Texas United States,
‘and_another_table_3’: firstname middlename lastname
Michael Alex Smith }我想根据这些数据文件的名称将它们合并在一起,这样最终我将只有3个数据文件:
table
name color type
Banana Yellow Fruit
Red Apple Fruit
Lettuce Green Vegetableanother_table
city state country
Atlanta Georgia United States
Arlington Virginia United States
Dallas Texas United Statesand_another_table
firstname middlename lastname
John Patrick Snow
Alex Justin Brown
Michael Alex Smith根据我最初的研究,Python应该可以做到这一点:
基于字典的关键names
.split、字典理解和itertools.groupby对字典中的数据进行分组,利用这些分组的results
pandas.concat函数循环遍历这些字典,并将数据组合到中。
我对Python没有太多的经验,我对如何编写这个代码有点迷茫。
我已经查看了How to group similar items in a list?和Merge dataframes in a dictionary的文章,但是它们并没有那么有帮助,因为在我的例子中,数据的名称、长度是不同的。
另外,我不想硬编码任何数据文件名,因为它们有1000多个。
发布于 2020-09-03 18:33:59
有一种方法:
给这个数据字典:
dd = {'table_1': pd.DataFrame({'Name':['Banana'], 'color':['Yellow'], 'type':'Fruit'}),
'table_2': pd.DataFrame({'Name':['Apple'], 'color':['Red'], 'type':'Fruit'}),
'another_table_1':pd.DataFrame({'city':['Atlanta'],'state':['Georgia'], 'Country':['United States']}),
'another_table_2':pd.DataFrame({'city':['Arlinton'],'state':['Virginia'], 'Country':['United States']}),
'and_another_table_1':pd.DataFrame({'firstname':['John'], 'middlename':['Patrick'], 'lastnme':['Snow']}),
'and_another_table_2':pd.DataFrame({'firstname':['Alex'], 'middlename':['Justin'], 'lastnme':['Brown']}),
}
tables = set([i.rsplit('_', 1)[0] for i in dd.keys()])
dict_of_dfs = {i:pd.concat([dd[x] for x in dd.keys() if x.startswith(i)]) for i in tables}输出一个新的合并表字典:
dict_of_dfs['table']
# Name color type
# 0 Banana Yellow Fruit
# 0 Apple Red Fruit
dict_of_dfs['another_table']
# city state Country
# 0 Atlanta Georgia United States
# 0 Arlinton Virginia United States
dict_of_dfs['and_another_table']
# firstname middlename lastnme
# 0 John Patrick Snow
# 0 Alex Justin Brown另一种方法是从集合中使用defaultdict,创建组合数据的列表:
from collections import defaultdict
import pandas as pd
dd = {'table_1': pd.DataFrame({'Name':['Banana'], 'color':['Yellow'], 'type':'Fruit'}),
'table_2': pd.DataFrame({'Name':['Apple'], 'color':['Red'], 'type':'Fruit'}),
'another_table_1':pd.DataFrame({'city':['Atlanta'],'state':['Georgia'], 'Country':['United States']}),
'another_table_2':pd.DataFrame({'city':['Arlinton'],'state':['Virginia'], 'Country':['United States']}),
'and_another_table_1':pd.DataFrame({'firstname':['John'], 'middlename':['Patrick'], 'lastnme':['Snow']}),
'and_another_table_2':pd.DataFrame({'firstname':['Alex'], 'middlename':['Justin'], 'lastnme':['Brown']}),
}
tables = set([i.rsplit('_', 1)[0] for i in dd.keys()])
d = defaultdict(list)
[d[i].append(dd[k]) for i in tables for k in dd.keys() if k.startswith(i)]
l_of_dfs = [pd.concat(d[i]) for i in d.keys()]
print(l_of_dfs[0])
print('\n')
print(l_of_dfs[1])
print('\n')
print(l_of_dfs[2])输出:
city state Country
0 Atlanta Georgia United States
0 Arlinton Virginia United States
firstname middlename lastnme
0 John Patrick Snow
0 Alex Justin Brown
Name color type
0 Banana Yellow Fruit
0 Apple Red Fruithttps://stackoverflow.com/questions/63729235
复制相似问题