问用Python网络抓取新冠肺炎数据的data.cdc.gov
EN

Code Review用户

提问于 2021-12-09 16:12:25

回答 1查看 493关注 0票数 4

我正试图在data.cdc.gov上搜索他们的新冠肺炎上有关病例和死亡的信息。

我遇到的问题是，代码看起来效率很低。代码需要非常长的时间才能工作。由于某种原因，CDC的XML文件根本无法工作，API也不完整。我需要关于新冠肺炎的所有信息，从2020年1月22日开始，直到现在。然而，API并不包含所有那些日子的所有信息。请有人协助我使此代码更有效，以便我可以更无缝地提取我需要的信息。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time

options = Options()
options.add_argument('--no-sandbox')
url = 'https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36/data'
driver = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\chromedriver.exe",options=options)

driver.implicitly_wait(10)
driver.get(url)


while True:
    rows = driver.find_elements_by_xpath("//div[contains(@class, 'socrata-table frozen-columns')]")  
    covid_fin = []
    for table in rows:        
        headers = []
        for head in table.find_elements_by_xpath('//*[@id="renderTypeContainer"]/div[4]/div[2]/div/div[4]/div[1]/div/table/thead/tr/th'):
            headers.append(head.text)        
        for row in table.find_elements_by_xpath('//*[@id="renderTypeContainer"]/div[4]/div[2]/div/div[4]/div[1]/div/table/tbody/tr'):
            covid = []
            for col in row.find_elements_by_xpath("./*[name()='td']"):
                covid.append(col.text)
            if covid:
                covid_dict = {headers[i]: covid[i] for i in 
                range(len(headers))}
                covid_fin.append(covid_dict)
    try:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'pager-button-next'))).click()
        time.sleep(5)
    except:
        break

python

python-3.x

web-scraping

selenium

回答 1

Code Review用户

发布于 2021-12-09 19:25:40

别刮了。删除所有代码。转到该页面并下载其中一个导出类型。XML更丰富，字段更多，但是CSV更紧凑。

票数 4

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/270843

复制

相似问题

问用Python网络抓取新冠肺炎数据的data.cdc.gov
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python网络抓取新冠肺炎数据的data.cdc.govEN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python网络抓取新冠肺炎数据的data.cdc.gov
EN