文章/答案/技术大牛

发布

社区首页 >问答首页 >如何将包含嵌套字典的列表列表和每个列表中的列表转换为pandas数据帧

问如何将包含嵌套字典的列表列表和每个列表中的列表转换为pandas数据帧
EN

Stack Overflow用户

提问于 2017-12-03 11:06:06

回答 2查看 47关注 0票数 0

我正在尝试将以下输出转换为pandas Dataframe

[{'category': "Best restaurant that's been around forever and is still worth the trip", 'winner': ['Lula Cafe'], 'runners_up': ['Frontera Grill', 'Chicago Diner ', 'Sabatino’s', 'Twin Anchors']}] [{'category': 'Best fancy restaurant in Chicago', 'winner':
['Alinea '], 'runners_up': ['Blackbird', 'Girl & the Goat', 'Green Zebra', 'The Publican']}] [{'category': 'Best bang for your buck', 'winner': ['Big Star', 'Sultan’s Market'], 'runners_up': ['Frasca Pizzeria & Wine Bar', 'Chutney Joe’s', '"My boyfriend!"']}]
[{'category': 'Best chef', 'winner': ['Rick Bayless (Frontera Grill, Topolobampo, Xoco)'], 'runners_up': ['Grant Achatz (Alinea, Next, The Aviary)', 'Stephanie Izard (Girl & the Goat)']}]

I am expecting a dataframe with column names as category, winner and runner's up respectively and entries into subsequent columns.Any suggestions




here is the code:
Im basically trying to scrape a web page with beautiful Soup.(though jus a  beginner)

    def make_soup(url):
        page = requests.get(url)
        return BeautifulSoup(page.content,'lxml')

    # function to get all the categories corresponding to a url
    def get_category(section_url):
        soup = make_soup(section_url)
        boccat = soup.find('dl','boccat')
        category_links = [base_url + dd.a['href'] for dd in boccat.find_all('dd')]
        return category_links

    #function to print winner and runner's up pertaining to each category
    def category_winner(category_url):
        soup = make_soup(category_url)
        category = soup.find('h1','headline').string
        winner = [h2.string for h2 in soup.findAll("h2", "boc1")]
        runners_up = [h2.string for h2 in soup.findAll("h2", "boc2")]
        return {'category' : category,
            'winner' : winner,
            'runners_up' : runners_up}


    # url for which the winners are to be found
    food_n_drink = ('https://www.chicagoreader.com/chicago/best-of-chicago-2011-
    food-drink/BestOf?oid=4106228')

    categories = get_category(food_n_drink)
    data = []
    for cat in categories:
        winner = category_winner(cat)
        data.append(winner)
        print(data)

最后一行代码给出了输出，即多个列表，我在question.My中共享了前4个列表，目的是从输出创建一个数据帧以供使用

pandas

回答 2

Stack Overflow用户

发布于 2017-12-03 14:33:20

如果k是以逗号分隔的列表列表：

[{'category': "Best restaurant that's been around forever and is still worth the trip", 'winner': ['Lula Cafe'], 'runners_up': ['Frontera Grill', 'Chicago Diner ', 'Sabatino’s', 'Twin Anchors']}] , [{'category': 'Best fancy restaurant in Chicago', 'winner':['Alinea '], 'runners_up': ['Blackbird', 'Girl & the Goat', 'Green Zebra', 'The Publican']}] , [{'category': 'Best bang for your buck', 'winner': ['Big Star', 'Sultan’s Market'], 'runners_up': ['Frasca Pizzeria & Wine Bar', 'Chutney Joe’s', '"My boyfriend!"']}] , [{'category': 'Best chef', 'winner': ['Rick Bayless (Frontera Grill, Topolobampo, Xoco)'], 'runners_up': ['Grant Achatz (Alinea, Next, The Aviary)', 'Stephanie Izard (Girl & the Goat)']}]

然后

emptydict = {}
diction = {}
df = pd.DataFrame.from_dict(emptydict, orient='index')
df = df.T

for i in k:
    for j in i:
        for key, value in j.items():

            diction[key] = value

        df = df.append(diction, ignore_index=True, verify_integrity=False)

就能完成这项工作。

票数 0

Stack Overflow用户

发布于 2017-12-03 14:06:57

您可以从字典列表或列表列表中创建熊猫数据帧。您的输出是包装在单独列表中的单独字典。如果将它们定义为字典或列表，或者字典或列表的列表，则可以从它们创建df。

重新格式化的输入：

d1 = {'category': "Best restaurant that's been around forever and is still worth the trip",
  'winner': ['Lula Cafe'], 
  'runners_up': ['Frontera Grill', 'Chicago Diner ', 'Sabatino’s', 'Twin Anchors']}

d2 = {'category': 'Best fancy restaurant in Chicago', 
  'winner': ['Alinea '],
  'runners_up': ['Blackbird', 'Girl & the Goat', 'Green Zebra', 'The Publican']}

d3 = {'category': 'Best bang for your buck', 
  'winner': ['Big Star', 'Sultan’s Market'], 
  'runners_up': ['Frasca Pizzeria & Wine Bar', 'Chutney Joe’s', '"My boyfriend!"']}

d4 = {'category': 'Best chef', 
  'winner': ['Rick Bayless (Frontera Grill, Topolobampo, Xoco)'],
  'runners_up': ['Grant Achatz (Alinea, Next, The Aviary)', 'Stephanie Izard (Girl & the Goat)']}

创建df：

pd.DataFrame([d1, d2, d3, d4])

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47614866

复制

相似问题

问如何将包含嵌套字典的列表列表和每个列表中的列表转换为pandas数据帧
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将包含嵌套字典的列表列表和每个列表中的列表转换为pandas数据帧EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将包含嵌套字典的列表列表和每个列表中的列表转换为pandas数据帧
EN