首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >HtmlUnit不getPage

HtmlUnit不getPage
EN

Stack Overflow用户
提问于 2016-05-11 21:44:30
回答 1查看 2.5K关注 0票数 2

使用HtmlUnit从互联网上抓取数据,我需要登录下面的页面https://accounts.google.com/login

当使用"getPage()“方法时,我一直得到这个例外,我如何解决它?先谢谢你

代码语言:javascript
复制
  Exception in thread "main" ======= EXCEPTION START ========
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: AssertionError: Assertion failed: No element found with className: signin-card (script in https://accounts.google.com/login?hl=es#identifier from (2653, 11) to (2753, 10)#2660)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:894)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:776)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:752)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:740)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:916)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:307)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:368)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:238)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:257)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:773)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:730)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1209)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1111)
    at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:207)
    at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:337)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3137)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2100)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:927)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:506)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:459)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:980)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:269)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:157)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:512)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:386)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:304)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:451)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:436)
    at prog.htmlUnit.Scrapeo.iniciaSesion(Scrapeo.java:74)
    at prog.htmlUnit.ProgramaPruebas.main(ProgramaPruebas.java:24)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: [object Object] (script in https://accounts.google.com/login?hl=es#identifier from (2653, 11) to (2753, 10)#2660)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1006)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3286)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:767)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:879)
    ... 35 more
JavaScriptException value = [object Object]
======= EXCEPTION END ========

抛出异常的部分非常简单:

代码语言:javascript
复制
public HtmlPage iniciaSesion(String correo, String pass) throws FailingHttpStatusCodeException, MalformedURLException, IOException{
        HtmlPage pagActual;
        HtmlTextInput cajaTexto;
        HtmlButton boton;

        pagActual= cliente.getPage("https://accounts.google.com/login?hl=es#identifier");

        return pagActual;

主程序只调用此方法并使用.asXml()方法,但它在使用它之前抛出异常。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-05-12 06:58:32

你需要在你的客户端启用Javascript。这一守则应适用于:

代码语言:javascript
复制
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); 
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);

String url = "https://accounts.google.com/login";
final HtmlPage page = client.getPage(url);

System.out.println(page.asText());
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/37173811

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档