文章/答案/技术大牛

发布

社区首页 >问答首页 >selenium fire StaleElementReferenceException

问selenium fire StaleElementReferenceException
EN

Stack Overflow用户

提问于 2019-05-15 21:01:16

回答 1查看 193关注 0票数 0

我试着用selenium做一个网络爬虫。我的程序触发了一个StaleElementReferenceException。我认为这是因为我以递归方式爬行一个页面，当一个页面不再有链接时，该函数导航到下一个页面，而不是以前的父页面。

因此，我引入了一个树形数据结构来在当前url不等于父url时导航回到父url。但这不是我的问题的解决方案。

有人能帮我吗？

代码：

public class crawler {
    private static FirefoxDriver driver;
    private static String main_url = "https://robhammond.co/tools/seo-crawler";
    private static List<String> uniqueLinks = new ArrayList<String>();

    public static void main(String[] args) {
        driver = new FirefoxDriver();

        Node<String> root = new Node<>(main_url);

        scrape(root, main_url);
    }

    public static void scrape(Node<String> node, String url) {
        if(node.getParent() != null && (!driver.getCurrentUrl().equals(node.getParent().getData()))) {
            driver.navigate().to(node.getParent().getData());
        }

        driver.navigate().to(url);

        List<WebElement> allLinks = driver.findElements(By.tagName("a"));

        for(WebElement link : allLinks) {
            if(link.getAttribute("href").contains(main_url) && !uniqueLinks.contains(link.getAttribute("href")) && link.isDisplayed()) {
                uniqueLinks.add(link.getAttribute("href"));

                System.out.println(link.getAttribute("href"));

                scrape(new Node<>(link.getAttribute("href")), link.getAttribute("href"));
            }
        }
    }
}

这是控制台的输出：

D:\Programme\openjdk-12.0.1_windows-x64_bin\jdk-12.0.1\bin\java.exe "-javaagent:D:\Programme\JetBrains\IntelliJ IDEA 2019.1.2\lib\idea_rt.jar=60461:D:\Programme\JetBrains\IntelliJ IDEA 2019.1.2\bin" -Dfile.encoding=UTF-8 -classpath C:\Users\admin\Desktop\SeleniumWebScraper\out\production\SeleniumWebScraper;D:\Downloads\selenium-server-standalone-3.141.59.jar de.company.crawler.crawler
1557924446770   mozrunner::runner   INFO    Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "-marionette" "-foreground" "-no-remote" "-profile" "C:\\Users\\admin\\AppData\\Local\\Temp\\rust_mozprofile.YqmEqE8y1pjv"
1557924447037   addons.webextension.screenshots@mozilla.org WARN    Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: mozillaAddons
1557924447037   addons.webextension.screenshots@mozilla.org WARN    Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: resource://pdf.js/
1557924447037   addons.webextension.screenshots@mozilla.org WARN    Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: about:reader*
1557924448047   Marionette  INFO    Listening on port 60468
1557924448383   Marionette  WARN    TLS certificate errors will be ignored for this session
Mai 15, 2019 2:47:28 NACHM. org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
JavaScript warning: https://robhammond.co/js/jquery.min.js, line 4: Using //@ to indicate sourceMappingURL pragmas is deprecated. Use //# instead
https://robhammond.co/tools/seo-crawler#content
https://twitter.com/intent/tweet?text=SEO%20Crawler&url=https://robhammond.co/tools/seo-crawler&via=robhammond
Exception in thread "main" org.openqa.selenium.StaleElementReferenceException: The element reference of <a href="/tools/"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/stale_element_reference.html
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:25:53'
System info: host: 'DESKTOP-admin', ip: '192.168.233.1', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '12.0.1'
Driver info: org.openqa.selenium.firefox.FirefoxDriver
Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 66.0.5, javascriptEnabled: true, moz:accessibilityChecks: false, moz:geckodriverVersion: 0.24.0, moz:headless: false, moz:processID: 19124, moz:profile: C:\Users\admin\AppData\Loca..., moz:shutdownTimeout: 60000, moz:useNonSpecCompliantPointerOrigin: false, moz:webdriverClick: true, pageLoadStrategy: normal, platform: WINDOWS, platformName: WINDOWS, platformVersion: 10.0, rotatable: false, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
Session ID: b3b87675-57c8-4b48-9a20-8df5e4d37503
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
    at org.openqa.selenium.remote.http.W3CHttpResponseCodec.createException(W3CHttpResponseCodec.java:187)
    at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:122)
    at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:49)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:158)
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552)
    at org.openqa.selenium.remote.RemoteWebElement.execute(RemoteWebElement.java:285)
    at org.openqa.selenium.remote.RemoteWebElement.getAttribute(RemoteWebElement.java:134)
    at de.company.crawler.crawler.scrape(crawler.java:33)
    at de.company.crawler.crawler.scrape(crawler.java:38)
    at de.company.crawler.crawler.main(crawler.java:20)

Process finished with exit code 1

java

selenium

selenium-webdriver

web-crawler

staleelementreferenceexception

回答 1

Stack Overflow用户

发布于 2019-05-15 23:17:31

当您离开第一页时，allLinks列表中的所有WebElements都会丢失。

我建议将它从WebElement列表转换为普通Strings列表，如下所示：

link.getAttribute("href")).collect(Collectors.toList())；allLinksHrefs = allLinks.stream().map(link -> List

然后遍历这个新的allLinksHrefs列表。

您可以使用基于散列的集合来保存uniqueLinks，就像HashSet一样-这样，副本将自动和当前方法可能需要几天才能完成，请考虑使用Selenium Grid eliminated

The running your scraper in Parallel

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56150033

复制

相似问题

问selenium fire StaleElementReferenceException
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问selenium fire StaleElementReferenceExceptionEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问selenium fire StaleElementReferenceException
EN