文章/答案/技术大牛

发布

问在数据字典中合并数据
EN

Stack Overflow用户

提问于 2020-09-03 17:56:48

回答 1查看 176关注 0票数 1

我有一个数据格式的字典dict，如：

{   
‘table_1’:              name             color             type
                        Banana           Yellow            Fruit,
‘another_table_1’:      city             state             country
                        Atlanta          Georgia           United States,
‘and_another_table_1’:  firstname        middlename        lastname
                        John             Patrick           Snow,
‘table_2’:              name             color             type
                        Red              Apple             Fruit,
‘another_table_2’:      city             state             country
                        Arlington        Virginia          United States,
‘and_another_table_2’:  firstname        middlename        lastname
                        Alex             Justin            Brown,
‘table_3’:              name             color             type
                        Lettuce          Green             Vegetable,
‘another_table_3’:      city             state             country
                        Dallas           Texas             United States,
‘and_another_table_3’:  firstname        middlename        lastname
                        Michael          Alex              Smith             }

我想根据这些数据文件的名称将它们合并在一起，这样最终我将只有3个数据文件：

table

name        color       type
Banana     Yellow     Fruit
Red         Apple     Fruit
Lettuce     Green     Vegetable

another_table

city        state          country
Atlanta     Georgia        United States
Arlington   Virginia       United States
Dallas      Texas          United States

and_another_table

firstname        middlename        lastname
John             Patrick           Snow
Alex             Justin            Brown
Michael          Alex              Smith

根据我最初的研究，Python应该可以做到这一点：

基于字典的关键names

Creating字典，通过使用.split、字典理解和itertools.groupby对字典中的数据进行分组，利用这些分组的results

Using pandas.concat函数循环遍历这些字典，并将数据组合到

中。

我对Python没有太多的经验，我对如何编写这个代码有点迷茫。

我已经查看了How to group similar items in a list?和Merge dataframes in a dictionary的文章，但是它们并没有那么有帮助，因为在我的例子中，数据的名称、长度是不同的。

另外，我不想硬编码任何数据文件名，因为它们有1000多个。

group-by

nested

python

pandas

dictionary

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-09-03 18:33:59

有一种方法：

给这个数据字典：

dd = {'table_1': pd.DataFrame({'Name':['Banana'], 'color':['Yellow'], 'type':'Fruit'}),
      'table_2': pd.DataFrame({'Name':['Apple'], 'color':['Red'], 'type':'Fruit'}),
      'another_table_1':pd.DataFrame({'city':['Atlanta'],'state':['Georgia'], 'Country':['United States']}),
      'another_table_2':pd.DataFrame({'city':['Arlinton'],'state':['Virginia'], 'Country':['United States']}),
      'and_another_table_1':pd.DataFrame({'firstname':['John'], 'middlename':['Patrick'], 'lastnme':['Snow']}),
      'and_another_table_2':pd.DataFrame({'firstname':['Alex'], 'middlename':['Justin'], 'lastnme':['Brown']}),
     }

tables = set([i.rsplit('_', 1)[0] for i in dd.keys()])
dict_of_dfs = {i:pd.concat([dd[x] for x in dd.keys() if x.startswith(i)]) for i in tables}

输出一个新的合并表字典：

dict_of_dfs['table']

#      Name   color   type
# 0  Banana  Yellow  Fruit
# 0   Apple     Red  Fruit

dict_of_dfs['another_table']

#        city     state        Country
# 0   Atlanta   Georgia  United States
# 0  Arlinton  Virginia  United States

dict_of_dfs['and_another_table']

#   firstname middlename lastnme
# 0      John    Patrick    Snow
# 0      Alex     Justin   Brown

另一种方法是从集合中使用defaultdict，创建组合数据的列表：

from collections import defaultdict
import pandas as pd

dd = {'table_1': pd.DataFrame({'Name':['Banana'], 'color':['Yellow'], 'type':'Fruit'}),
      'table_2': pd.DataFrame({'Name':['Apple'], 'color':['Red'], 'type':'Fruit'}),
      'another_table_1':pd.DataFrame({'city':['Atlanta'],'state':['Georgia'], 'Country':['United States']}),
      'another_table_2':pd.DataFrame({'city':['Arlinton'],'state':['Virginia'], 'Country':['United States']}),
      'and_another_table_1':pd.DataFrame({'firstname':['John'], 'middlename':['Patrick'], 'lastnme':['Snow']}),
      'and_another_table_2':pd.DataFrame({'firstname':['Alex'], 'middlename':['Justin'], 'lastnme':['Brown']}),
     }
tables = set([i.rsplit('_', 1)[0] for i in dd.keys()])

d = defaultdict(list)

[d[i].append(dd[k]) for i in tables for k in dd.keys() if k.startswith(i)]
l_of_dfs = [pd.concat(d[i]) for i in d.keys()]
print(l_of_dfs[0])
print('\n')
print(l_of_dfs[1])
print('\n')
print(l_of_dfs[2])

输出：

       city     state        Country
0   Atlanta   Georgia  United States
0  Arlinton  Virginia  United States


  firstname middlename lastnme
0      John    Patrick    Snow
0      Alex     Justin   Brown


     Name   color   type
0  Banana  Yellow  Fruit
0   Apple     Red  Fruit

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63729235

复制

相似问题

问在数据字典中合并数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在数据字典中合并数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在数据字典中合并数据
EN