我有一个代码,它可以引导我访问一个网站,并打印出每个会话的所有标题、日期和时间。
但是,如果您单击网站上的每个会话,就会有一个下拉列表中的子会话。
我想打印分会的每个标题。
下面是我拥有的代码
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import requests
driver = webdriver.Chrome()
session=[]
driver.get('https://library.iaslc.org/conference-program?product_id=20&author=&category=&date=&session_type=&session=&presentation=&keyword=&available=&cme=&page=1')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=soup.find_all('div',class_='accordin_title')
for item in productlist:
title=item.find('h4').text.strip()
tim=item.find('span',class_='info_red').text.strip()
dat=item.find('span',class_='info_blue').text.strip()
dictionary={"Title":title,"Time":tim,"Date":dat}
session.append(dictionary)
print(session)发布于 2021-01-08 16:31:40
尝试以下操作以获取所需的内容。
要获取会话标题及其子会话标题,请执行以下操作:
import requests
from bs4 import BeautifulSoup
url = 'https://library.iaslc.org/conference-program?product_id=20&author=&category=&date=&session_type=&session=&presentation=&keyword=&available=&cme=&page=1'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
r = s.get(url)
soup = BeautifulSoup(r.text,"lxml")
for items in soup.select("#accordin .accordin_details"):
session_title = items.find_previous(class_="accordin_title").h4.get_text(strip=True)
subsession_titles = [item.get_text(strip=True) for item in items.select(".accordin_list h4")]
print(session_title,subsession_titles)要仅获取子会话标题:
for item in soup.select("#accordin .accordin_details .accordin_list h4"):
print(item.get_text(strip=True))https://stackoverflow.com/questions/65625224
复制相似问题