我试图在动态生成的网页https://www.governmentjobs.com/careers/capecoral?page=1之后进行抓取,我使用了请求、刮擦、刮擦-飞溅,但我只是得到页面源代码,而我没有得到任何工作列表。
import requests
from bs4 import BeautifulSoup`
r = requests.get("https://www.governmentjobs.com/careers/capecoral?page=1")
soup = BeautifulSoup(r.content)
n_jobs = soup.select("#number-found-items")[0].text.strip()
print(n_jobs)它总是返回找到的0项作业。
发布于 2022-03-01 10:50:47
由于url是动态的,所以可以在bs4中使用selenium来获取所需的数据。这是一个example.Please,只需运行代码。
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
url = "https://www.governmentjobs.com/careers/capecoral?page=1"
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)
soup = BeautifulSoup(driver.page_source, 'lxml')
for title in soup.select('.list-item h3 > a'):
print(title.text)输出:
Assistant City Attorney / City Attorney's Office
Business Applications Analyst II / Information Technology Services #6425
Contract Athletic Official / Athletics / Parks & Recreation #6237
Contract Background Investigation Specialist / Investigations / Police Dept. #6514
Contract Beverage Cart/Waiter/Waitress / Parks and Recreation / Coral Oaks #6479
Contract Counselor / Youth Center / Parks & Recreation #6317
Contract Counselor/Instructor / Parks & Recreation / Special Populations #6339
Contract Custodial Worker / Lake Kennedy / Parks & Recreation #6525
Contract Custodial Worker /Parks & Recreation / Yacht Club #6312
Contract Golf Course Outside Operations / Parks & Recreation / Coral Oaks #6535发布于 2022-03-01 10:20:11
您正在尝试从使用javascript的网站中删除数据,为此,您必须使用selenium确保页面完全使用数据呈现,然后发送请求获取页面内容。
发布于 2022-03-02 03:15:07
我在网络中只需以curl格式复制请求,然后使用https://curlconverter.com/将其转换为python代码。
https://stackoverflow.com/questions/71306518
复制相似问题