我开始用groovy写一些脚本。我写了这个脚本,它基本上解析了一个html页面,并对数据做了一些处理。
现在,我使用HTTPBuilder来执行http请求。每当我尝试执行这种类型的请求时,我都会得到这样的错误:
Caught: java.lang.IllegalAccessError: tried to access class groovyx.net.http.StringHashMap from class groovyx.net.http.HTTPBuilder
java.lang.IllegalAccessError: tried to access class groovyx.net.http.StringHashMap from class groovyx.net.http.HTTPBuilder
at groovyx.net.http.HTTPBuilder.<init>(HTTPBuilder.java:177)
at groovyx.net.http.HTTPBuilder.<init>(HTTPBuilder.java:218)
at Main$_main_closure1.doCall(Main.groovy:30)
at Main.main(Main.groovy:24)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:143)下面是主类的代码:
// Grap HTTPBuilder component from maven repository
@Grab(group='org.codehaus.groovy.modules.http-builder',
module='http-builder', version='0.5.2')
// import of HttpBuilder related stuff
import groovyx.net.http.*
import parsers.Parser
import parsers.WuantoParser
import parsers.Row
class Main {
static mapOfParsers = [:]
static void main(args) {
List<Row> results = new ArrayList<>()
// Initiating the parsers for the ebay-keywords websites
println "Initiating Parsers..."
initiateParsers()
println "Parsing Websites..."
mapOfParsers.each { key, parser ->
switch (key) {
case Constants.Parsers.WUANTO_PARSER:
println "Parsing Url: $Constants.Url.WUANTO_ROOT_CAT_URL"
println "Retrieving Html Content..."
def http = new HTTPBuilder(Constants.Url.WUANTO_ROOT_CAT_URL)
def html = http.get([:])
println "Parsing Html Content..."
results.addAll(((Parser) parser).parseHtml(html))
break
}
}
results.each {
println it
}
}
static void initiateParsers() {
mapOfParsers.put(Constants.Parsers.WUANTO_PARSER , new WuantoParser())
}
static void writeToFile(List<Row> rows) {
File file = "output.txt"
rows.each {
file.write it.toString()
}
}
}发布于 2017-02-02 00:20:53
好吧,让我们看看这个。我尝试运行您的代码片段中的代码,但是http builder依赖项版本0.5.2已经非常过时,并且在我的groovy脚本所指向的存储库中无法访问。因此,我将其替换为更新的版本0.7.1。
此外,代码中从http.get返回的html变量值实际上是经过解析的格式。即它不是文本,而是一个groovy NodeChild对象。这是因为默认情况下,http构建器执行html解析,如果需要,您必须显式地告诉它返回纯文本(即使这样它也会返回一个阅读器,而不是文本)。
下面是您的代码的一些重构和重写版本,演示了这一想法:
// Grap HTTPBuilder component from maven repository
@Grab('org.codehaus.groovy.modules.http-builder:http-builder:0.7.1')
import groovyx.net.http.*
import groovy.xml.XmlUtil
import static groovyx.net.http.ContentType.*
class MyParser {
def parseHtml(html) {
[html]
}
}
def mapOfParsers = [:]
mapOfParsers["WUANTO_PARSER"] = new MyParser()
result = []
mapOfParsers.each { key, parser ->
switch (key) {
case "WUANTO_PARSER":
// just a sample url which returns some html data
def url = "https://httpbin.org/links/10/0"
def http = new HTTPBuilder(url)
def html = http.get([:])
// the object returned from http.get is of type
// http://docs.groovy-lang.org/latest/html/api/groovy/util/slurpersupport/NodeChild.html
// this is a parsed format which is navigable in groovy
println "extracting HEAD.TITLE text: " + html.HEAD.TITLE.text()
println "class of returned object ${html.getClass().name}"
println "First 100 characters parsed and formatted:\n ${XmlUtil.serialize(html).take(100)}"
// forcing the returned type to be plain text
def reader = http.get(contentType : TEXT)
// what is returned now is a reader, we can get the text in groovy
// via reader.text
def text = reader.text
println "Now we are getting text, 100 first characters plain text:\n ${text.take(100)}"
result.addAll parser.parseHtml(text)
break
}
}
result.each {
println "result length ${it.length()}"
}运行上面的打印:
extracting HEAD.TITLE text: Links
class of returned object groovy.util.slurpersupport.NodeChild
First 100 characters parsed and formatted:
<?xml version="1.0" encoding="UTF-8"?><HTML>
<HEAD>
<TITLE>Links</TITLE>
</HEAD>
<BODY>0 <
Now we are getting text, 100 first characters plain text:
<html><head><title>Links</title></head><body>0 <a href='/links/10/1'>1</a> <a href='/links/10/2'>2</
result length 313(为简洁起见,省略了来自XmlUtil.serialize的几个警告)。
所有这些都不能解释为什么你会得到异常,但也许上面的方法可以让你解锁并解决这个问题。
https://stackoverflow.com/questions/41963676
复制相似问题