文章/答案/技术大牛

发布

问将数据写入csv
EN

Stack Overflow用户

提问于 2017-06-29 19:43:31

回答 1查看 156关注 0票数 1

我正在从维基百科上抓取数据，到目前为止，它是有效的。我可以在终端上显示它，但我不能把它写成我需要的csv文件:-/代码很长，但我还是把它粘贴在这里，希望有人能帮我。

import csv
import requests
from bs4 import BeautifulSoup


def spider():
    url = 'https://de.wikipedia.org/wiki/Liste_der_Gro%C3%9F-_und_Mittelst%C3%A4dte_in_Deutschland'
    code = requests.get(url).text  # Read source code and make unicode
    soup = BeautifulSoup(code, "lxml")  # create BS object

    table = soup.find(text="Rang").find_parent("table")
    for row in table.find_all("tr")[1:]:
        partial_url = row.find_all('a')[0].attrs['href']
        full_url = "https://de.wikipedia.org" + partial_url
        get_single_item_data(full_url)          # goes into the individual sites


def get_single_item_data(item_url):
    page = requests.get(item_url).text  # Read source code & format with .text to unicode
    soup = BeautifulSoup(page, "lxml")  # create BS object
    def getInfoBoxBasisDaten(s):
        return str(s) == 'Basisdaten' and s.parent.name == 'th'
    basisdaten = soup.find_all(string=getInfoBoxBasisDaten)[0]

    basisdaten_list = ['Bundesland', 'Regierungsbezirk:', 'Höhe:', 'Fläche:', 'Einwohner:', 'Bevölkerungsdichte:',
                        'Postleitzahl', 'Vorwahl:', 'Kfz-Kennzeichen:', 'Gemeindeschlüssel:', 'Stadtgliederung:',
                        'Adresse', 'Anschrift', 'Webpräsenz:', 'Website:', 'Bürgermeister', 'Bürgermeisterin',
                        'Oberbürgermeister', 'Oberbürgermeisterin']

    with open('staedte.csv', 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['Bundesland', 'Regierungsbezirk:', 'Höhe:', 'Fläche:', 'Einwohner:', 'Bevölkerungsdichte:',
                        'Postleitzahl', 'Vorwahl:', 'Kfz-Kennzeichen:', 'Gemeindeschlüssel:', 'Stadtgliederung:',
                        'Adresse', 'Anschrift', 'Webpräsenz:', 'Website:', 'Bürgermeister', 'Bürgermeisterin',
                        'Oberbürgermeister', 'Oberbürgermeisterin']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL, extrasaction='ignore')
        writer.writeheader()

        for i in basisdaten_list:
            wanted = i
            current = basisdaten.parent.parent.nextSibling
            while True:
                if not current.name:
                    current = current.nextSibling
                    continue
                if wanted in current.text:
                    items = current.findAll('td')
                    print(BeautifulSoup.get_text(items[0]))
                    print(BeautifulSoup.get_text(items[1]))
                    writer.writerow({i: BeautifulSoup.get_text(items[1])})

                if '<th ' in str(current): break
                current = current.nextSibling


print(spider())

输出有两个方面不正确。单元格是它们正确的位置，只有一个城市被写入，所有其他城市都被遗漏了。它看起来是这样的：

但它应该看起来像这样+其中的所有其他城市：

beautifulsoup

python

csv

web-scraping

回答 1

Stack Overflow用户

发布于 2017-06-30 03:20:57

的……只有一个城市被写成……“：你为每个城市调用get_single_item_data。然后在此函数内部，在语句with open('staedte.csv', 'w', newline='', encoding='utf-8') as csvfile:中打开具有相同名称的输出文件，该语句将在每次调用该函数时覆盖输出文件。

将每个变量写到新的行中：在语句writer.writerow({i: BeautifulSoup.get_text(items[1])})中，将一个变量的值写到一行中。相反，您需要做的是在开始查找页面值之前为值创建一个字典。当您从页面中累积这些值时，您可以通过字段名将它们放入字典中。然后，在找到所有可用的值之后，调用writer.writerow。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44824210

复制

相似问题

问将数据写入csv
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将数据写入csvEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将数据写入csv
EN