我正在尝试为一个使用selenium的网站抓取一些信息,下面是到网站http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742的链接,我试图获取的信息是在玩家‘统计信息’下面,我的代码现在打开玩家的配置文件,然后打开玩家的统计页面,我试图找到一种方法来提取下面的玩家统计页面中的信息,到目前为止我的代码是我的代码
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742")
soup = BeautifulSoup(driver.page_source,"lxml")
try:
dropdown = driver.find_element_by_xpath('//*[@id="playerPills"]/li[9]/a')
dropdown.click()
bm = driver.find_element_by_id('statisticsPill')
bm.click()
for i in soup.select('#statistics table.table tr'):
print(i)
data1 = [x.get_text(strip=True) for x in i.select("th,td")]
print(data1)
except ValueError:
print("error")我发球
<th class="pct-data text-right"><i class="fa fa-percent"></i></th>
<th class="raw-data text-right" style="display: none;"><i class="fa fa-hashtag"></i></th>
</tr>
</thead>
<tbody>
<tr>
<td>Ace %</td>
<th class="text-right pct-data">23.4%</th>
<th class="raw-data text-right" style="display: none;">12942 / 55377</th>
</tr>
<tr>
<td>Double Fault %</td>
<th class="text-right pct-data">4.2%</th>
<th class="raw-data text-right" style="display: 发布于 2018-08-08 06:08:54
要从Statistics页面提取播放机的信息,可以使用以下解决方案:
发布于 2018-08-08 06:18:21
问题是这条线的位置-
soup = BeautifulSoup(driver.page_source,"lxml")它应该出现在你点击“统计”选项卡之后。因为只有表装载和汤才能解析它。
最后的代码-
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome(executable_path=r'//path/chromedriver.exe')
driver.get("http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742")
try:
dropdown = driver.find_element_by_xpath('//*[@id="playerPills"]/li[9]/a')
dropdown.click()
bm = driver.find_element_by_id('statisticsPill')
bm.click()
driver.maximize_window()
soup = BeautifulSoup(driver.page_source,"lxml")
for i in soup.select('#statisticsOverview table tr'):
print(i.text)
data1 = [x.get_text(strip=True) for x in i.select("th,td")]
print(data1)
except ValueError:
print("error")https://stackoverflow.com/questions/51737769
复制相似问题