文章/答案/技术大牛

发布

社区首页 >问答首页 >WebScraping网页搜索结果的变化？

问WebScraping网页搜索结果的变化？
EN

Stack Overflow用户

提问于 2022-02-03 04:23:06

回答 1查看 36关注 0票数 0

我试图从搜索结果中获取数据，但是每次我尝试使用一个特定的链接提供给“美丽汤”时，我都会发现错误，我认为这是因为网页不是每次访问时都是相同的吗？我不太清楚这是什么叫甚至搜索，所以任何帮助将不胜感激。

这是到搜索结果的链接。但是，当你去访问它，除非你已经做了搜索，它不会显示的结果。https://www.clarkcountycourts.us/Portal/Home/WorkspaceMode?p=0

相反，如果您复制和粘贴它将带您到此页面进行搜索。https://www.clarkcountycourts.us/Portal/，然后您必须单击智能搜索。

因此，为了简单起见，假设我们搜索"Robinson“，我需要将表数据导出到excel文件中。我不能给美丽的汤一个链接，因为它是无效的，我相信？我该如何应对这个挑战？

即使用一个简单的视图表将表拉起来，也不会给出来自我们搜索"Robinson“的数据的任何信息，比如Case Number或File Date来创建一个熊猫数据框架。

//编辑//到目前为止，多亏了@Arun深处Chohan，这就是我所得到的。巨大的呼喊为伟大的帮助！

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import time
from bs4 import BeautifulSoup
import requests
import pandas as pd


options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
driver.implicitly_wait(20) # gives an implicit wait for 20 seconds

driver.get("https://www.clarkcountycourts.us/Portal/Home/Dashboard/29")

search_box = driver.find_element_by_id("caseCriteria_SearchCriteria")
search_box.send_keys("Robinson")

#Code to complete captchas
WebDriverWait(driver, 15).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']")))
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "//span[@id='recaptcha-anchor']"))).click()

driver.switch_to.default_content() #necessary to switch out of iframe element for submit button

time.sleep(5) #gives time to click submit to results
submit_box = driver.find_element_by_id("btnSSSubmit").click()

time.sleep(5)

soup = BeautifulSoup(driver.page_source,'html.parser')
df = pd.read_html(str(soup))[0]
print(df)

pandas

selenium

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-03 05:01:03

options = Options()
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
driver.maximize_window()
wait=WebDriverWait(driver,10)
driver.get('https://www.clarkcountycourts.us/Portal/')

wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"a.portlet-buttons"))).click()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input#caseCriteria_SearchCriteria"))).send_keys("Robinson")
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='reCAPTCHA']")))
elem=wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"div.recaptcha-checkbox-checkmark")))
driver.execute_script("arguments[0].click()", elem)
driver.switch_to.default_content()
x = input("Waiting for recaptcha done")
wait.until(EC.element_to_be_clickable((By.XPATH,"(//input[@id='btnSSSubmit'])[1]"))).click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
df = pd.read_html(str(soup))[0]
print(df)

如果你想知道的话，应该是访问你的页面的最低限度。有一个要处理的问题，还有一个需要处理的问题。在此之后，只需利用熊猫来抢桌子。

(编辑)：他们适当地添加了一个recaptcha，所以在我添加暂停输入的地方添加了一个求解器。

进口：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from bs4 import BeautifulSoup

产出：

Waiting for manual date to be entered. Enter YES when done.
  Unnamed: 0_level_0  ...                                      Date of Birth
         Case Number  ...                                          File Date
         Case Number  ...                                          File Date
0                NaN  ...                                                NaN
1                NaN  ...  Cases (1)  Case NumberStyle / DefendantFile Da...
2        Case Number  ...                                          File Date
3          08A575873  ...                                         11/17/2008
4                NaN  ...                                                NaN
5                NaN  ...  Cases (1)  Case NumberStyle / DefendantFile Da...
6        Case Number  ...                                          File Date
7          08A575874  ...                                         11/17/2008

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70965897

复制

相似问题

问WebScraping网页搜索结果的变化？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问WebScraping网页搜索结果的变化？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问WebScraping网页搜索结果的变化？
EN