首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Selenium抓取多个选择选项

使用Selenium抓取多个选择选项
EN

Stack Overflow用户
提问于 2020-06-21 22:46:01
回答 1查看 122关注 0票数 0

我被要求从网站https://secc.gov.in/lgdStateList刮PDF的。有一个州、一个区和一个街区的3个下拉菜单。有几个州,每个州下有区,每个区下有区块。

我尝试实现以下代码。我可以选择州,但在选择地区时似乎出现了一些错误。

代码语言:javascript
复制
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import time

from selenium import webdriver
from bs4 import BeautifulSoup

browser = webdriver.Chrome()
url = ("https://secc.gov.in/lgdStateList")
browser.get(url)
html_source = browser.page_source
browser.quit()
soup = BeautifulSoup(html_source, 'html.parser')
for name_list in soup.find_all(class_ ='dropdown-row'):
    print(name_list.text)

driver = webdriver.Chrome()
driver.get('https://secc.gov.in/lgdStateList')

selectState = Select(driver.find_element_by_id("lgdState"))
for state in selectState.options:
    state.click()
    selectDistrict = Select(driver.find_element_by_id("lgdDistrict"))
    for district in selectDistrict.options:
        district.click()
        selectBlock = Select(driver.find_element_by_id("lgdBlock"))
        for block in selectBlock.options():
            block.click()

我遇到的错误是:

代码语言:javascript
复制
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="lgdDistrict"]"}
  (Session info: chrome=83.0.4103.106)

我需要帮助在3个菜单中爬行。

如有任何帮助/建议,我们将不胜感激。如果评论中有任何澄清,请让我知道。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-21 23:21:17

这是您可以找到不同状态的the value的地方。您可以在地区和区块下拉列表中找到相同的内容。

现在,您应该在有效负载中使用这些值来获取要从中获取数据的表:

代码语言:javascript
复制
import urllib3
import requests
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

link = "https://secc.gov.in/lgdGpList"

payload = {
    'stateCode': '10',
    'districtCode': '188',
    'blockCode': '1624'
}

r = requests.post(link,data=payload,verify=False)
soup = BeautifulSoup(r.text,"html.parser")
for items in soup.select("table#example tr"):
    data = [' '.join(item.text.split()) for item in items.select("th,td")]
    print(data)

脚本产生的输出:

代码语言:javascript
复制
['Select State', 'Select District', 'Select Block']
['', 'Select District', 'Select Block']
['ARARIA BASTI (93638)', 'BANGAMA (93639)', 'BANSBARI (93640)']
['BASANTPUR (93641)', 'BATURBARI (93642)', 'BELWA (93643)']
['BOCHI (93644)', 'CHANDRADEI (93645)', 'CHATAR (93646)']
['CHIKANI (93647)', 'DIYARI (93648)', 'GAINRHA (93649)']
['GAIYARI (93650)', 'HARIA (93651)', 'HAYATPUR (93652)']
['JAMUA (93653)', 'JHAMTA (93654)', 'KAMALDAHA (93655)']
['KISMAT KHAWASPUR (93656)', 'KUSIYAR GAWON (93657)', 'MADANPUR EAST (93658)']
['MADANPUR WEST (93659)', 'PAIKTOLA (93660)', 'POKHARIA (93661)']
['RAMPUR KODARKATTI (93662)', 'RAMPUR MOHANPUR EAST (93663)', 'RAMPUR MOHANPUR WEST (93664)']
['SAHASMAL (93665)', 'SHARANPUR (93666)', 'TARAUNA BHOJPUR (93667)']

您需要在上面每个结果旁边的括号中抓取可用的数字,然后在payload中使用它们,并发送另一个post请求来下载pdf文件。确保在执行之前将脚本放在一个文件夹中,以便可以获取其中的所有文件。

代码语言:javascript
复制
import urllib3
import requests
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

link = "https://secc.gov.in/lgdGpList"
download_link = "https://secc.gov.in/downloadLgdwisePdfFile"

payload = {
    'stateCode': '10',
    'districtCode': '188',
    'blockCode': '1624'
}
r = requests.post(link,data=payload,verify=False)
soup = BeautifulSoup(r.text,"html.parser")
for item in soup.select("table#example td > a[onclick^='downloadLgdFile']"):
    gp_code = item.text.strip().split("(")[1].split(")")[0]
    payload['gpCode'] = gp_code
    with open(f'{gp_code}.pdf','wb') as f:
        f.write(requests.post(download_link,data=payload,verify=False).content)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62500126

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档