我正在尝试下载excel文件从一个网站使用selenium在无头模式。虽然它在大多数情况下运行得很好,但是有一些情况(一年中的几个月),driver.find_element_by_xpath()不能像预期的那样工作。我已经浏览过许多帖子,尽管在驱动程序查找时,元素可能没有出现,但情况并非如此,因为我彻底检查了它,并尝试使用time.sleep()来减缓进程,另外还注意到,我还使用driver.implicitly_wait()来简化工作,因为网站实际上需要一段时间才能在页面上加载内容。我不能使用请求,因为它在get请求的响应中没有显示任何数据。我的脚本如下:
from selenium import webdriver
import datetime
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
import os
import shutil
import time
import calendar
currentdir = os.path.dirname(__file__)
Initial_path = 'whateveritis'
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_experimental_option("prefs", {
"download.default_directory": f"{Initial_path}",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
})
def save_hist_data(year, months):
def waitUntilDownloadCompleted(maxTime=1200):
driver.execute_script("window.open()")
# switch to new tab
driver.switch_to.window(driver.window_handles[-1])
# navigate to chrome downloads
driver.get('chrome://downloads')
# define the endTime
endTime = time.time() + maxTime
while True:
try:
# get the download percentage
downloadPercentage = driver.execute_script(
"return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
# check if downloadPercentage is 100 (otherwise the script will keep waiting)
if downloadPercentage == 100:
# exit the method once it's completed
return downloadPercentage
except:
pass
# wait for 1 second before checking the percentage next time
time.sleep(1)
# exit method if the download not completed with in MaxTime.
if time.time() > endTime:
break
starts_on = 1
for month in months:
no_month = datetime.datetime.strptime(month, "%b").month
no_of_days = calendar.monthrange(year, no_month)[1]
print(f"{no_of_days} days in {month}-{year}")
driver = webdriver.Chrome(executable_path="whereeveritexists", options=chrome_options)
driver.maximize_window() #For maximizing window
driver.implicitly_wait(20)
driver.get("https://www.iexindia.com/marketdata/areaprice.aspx")
select = Select(driver.find_element_by_name('ctl00$InnerContent$ddlPeriod'))
select.select_by_visible_text('-Select Range-')
driver.find_element_by_xpath("//input[@name='ctl00$InnerContent$calFromDate$txt_Date']").click()
select = Select(driver.find_element_by_xpath("//td[@class='scwHead']/select[@id='scwYears']"))
select.select_by_visible_text(str(year))
select = Select(driver.find_element_by_xpath("//td[@class='scwHead']/select[@id='scwMonths']"))
select.select_by_visible_text(month)#问题在于这个块
test=None
while not test:
try:
driver.find_element_by_xpath(f"//td[@class='scwCells' and contains(text(),'{starts_on}')]").click()
test=True
except IndentationError:
print('Entered except block -IE')
driver.find_element_by_xpath(f"//td[@class='scwCellsWeekend' and contains(text(), '{starts_on}')]").click()
test=True
except:
print('Entered except block -IE-2')
driver.find_element_by_xpath(f"//td[@class='scwInputDate' and contains(text(), '{starts_on}')]").click()
test=True driver.find_element_by_xpath("//input[@name='ctl00$InnerContent$calToDate$txt_Date']").click()
select = Select(driver.find_element_by_xpath("//td[@class='scwHead']/select[@id='scwYears']"))
select.select_by_visible_text(str(year))
select = Select(driver.find_element_by_xpath("//td[@class='scwHead']/select[@id='scwMonths']"))
select.select_by_visible_text(month)#问题在于这个块
test=None
while not test:
try:
driver.find_element_by_xpath(f"//td[@class='scwCells' and contains(text(), '{no_of_days}')]").click()
# time.sleep(4)
test=True
except IndentationError:
print('Entered except block -IE')
driver.find_element_by_xpath(f"//td[@class='scwCellsWeekend' and contains(text(), '{no_of_days}')]").click()
# time.sleep(4)
test=True
except:
# time.sleep(2)
driver.find_element_by_xpath(f"//td[@class='scwInputDate' and contains(text(), '{no_of_days}')]").click()
test=True driver.find_element_by_xpath("//input[@name='ctl00$InnerContent$btnUpdateReport']").click()
driver.find_element_by_xpath("//a[@title='Export drop down menu']").click()
print("Right before excel button click")
driver.find_element_by_xpath("//a[@title='Excel']").click()
waitUntilDownloadCompleted(180)
print("After the download potentially!")
filename = max([Initial_path + f for f in os.listdir(Initial_path)],key=os.path.getctime)
shutil.move(filename,os.path.join(Initial_path,f"{month}{year}.xlsx"))
driver.quit()
def main():
# years = list(range(2013,2015))
# months = ['Jan', 'Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
# for year in years:
# try:
save_hist_data(2018, ['Mar'])
# except:
# pass
if __name__== '__main__':
main()while循环基本上用于选择日历上的date元素(已经从下拉列表中选择了月份和年份)。因为网站有不同的标签,如果日期是在工作日或周末,我使用了try和but块来尝试所有可能的xpath,但奇怪的是,一年中的一些月根本不像预期的那样工作。这是btw "https://www.iexindia.com/marketdata/areaprice.aspx"“链接,特别是在2018年3月-2018年3月-2018年3月-2018年3月-2018年,在chrome浏览器上搜索xpath,它位于2018年3月-2018年3月31日,但是当执行python脚本时,它抛出并出错,上面写着selenium.common.exceptions.NoSuchElementException:消息:没有这样的元素:无法定位元素:{”方法“:”xpath“,”选择器“:”//td@class=‘scwInputDate’并包含(text(),'31')"} (会话信息:chrome=84.0.4147.105:)。
发布于 2020-08-24 13:50:26
问题是除了:异常处理。按照您的代码块,如果"//td[@class='scwCells' and contains(text(), '{no_of_days}')]"没有找到元素。因为3月31日的类是scwCellsWeekend元素,所以找不到。
首先,
IdentationException。由于not元素不是一个IdentationException,所以除了第二个异常IdentationException之外,它将进行下一步--除了没有提到任何条件之外,在其中处理NoSuchElementException。按照这里给出的代码,它试图使用xpath //td[@class='scwInputDate' and contains(text(), '31')]搜索和元素。这也是无法找到的结果,因此您得到了NoSuchElementException.与其使用如此多的异常处理方案,您还可以使用逻辑运算符或注释:
driver.find_element_by_xpath(f"//td[@class='scwCellsWeekend' and contains(text(), '{no_of_days}')] | //td[@class='scwCells' and contains(text(), '{no_of_days}')] | //td[@class='scwInputDate' and contains(text(), '{no_of_days}')]").click()https://stackoverflow.com/questions/63549005
复制相似问题