我正在使用jsoup从web中提取信息,我的代码如下所示:
doc = Jsoup.connect(myurl).get();
Elements newsHeadlines = doc.select(".myclass");如果我做了一个System.out.println of newsHeadlines,就会得到以下内容:
<span class="cmtComentario">
<span class="blaicon"></span>
<span class="blacoment"><span class="cmtHora" data-hora=""></span>
<span class="blathing" data-minutoPartido="93'"></span>
<span class="blado"></span>
<span class="blahave">
Oh yeah!<br/></span>
</span>
</span>
<span class="cmtComentario">
<span class="blaicon"></span>
<span class="blacoment"><span class="cmtHora" data-hora=""></span>
<span class="blathing" data-health="97'"></span>
<span class="blado"></span>
<span class="blahave">
This is my world</span>
</span>
</span>如何保存数组中的每个块:
<span class="cmtComentario">
<span class="blaicon"></span>
<span class="blacoment"><span class="cmtHora" data-hora=""></span>
<span class="blathing" data-health="92'"></span>
<span class="blado"></span>
<span class="blahave">
This is my world</span>
</span>
</span>非常感谢
发布于 2014-12-24 15:39:22
newsHeadlines只不过是元素的列表,因为元素实现列表。
因此,您可以以与遍历列表相同的方式遍历newsHeadlines。
for(Element element : newsHeadlines) {
System.out.println(element.toString());
}如果这不是您所需要的(我没有测试代码),您可以尝试Element.children。这再次给出了可以迭代的元素。
发布于 2014-12-24 16:04:24
您还可以为每个注释添加一个div标记,并使用一些Java 8语法糖来收集Element-instances中的List
Elements elements = Jsoup.parse(markup).getAllElements().select(".myclass");
List<Element> comments = elements.stream().collect(Collectors.<Element>toList());
for(Element comment : comments) {
System.out.println(comment.html());
}为了测试,我使用了解析,而不是连接方法。
它打印:
<span class="cmtComentario"> <span class="blaicon">1</span>.......
<span class="cmtComentario"> <span class="blaicon">2</span>........测试标记:
String markup = "" +
"<div class=\"myclass\">\n" +
"<span class=\"cmtComentario\">\n" +
"<span class=\"blaicon\">1</span>\n" +
"<span class=\"blacoment\"><span class=\"cmtHora\" data-hora=\"\"></span>\n" +
"<span class=\"blathing\" data-minutoPartido=\"93'\"></span>\n" +
"<span class=\"blado\"></span>\n" +
"<span class=\"blahave\">\n" +
"Oh yeah!<br/></span>\n" +
"</span>\n" +
"</span>\n" +
"</div>" +
"<div class=\"myclass\">\n" +
"<span class=\"cmtComentario\">\n" +
"<span class=\"blaicon\">2</span>\n" +
"<span class=\"blacoment\"><span class=\"cmtHora\" data-hora=\"\"></span>\n" +
"<span class=\"blathing\" data-health=\"97'\"></span>\n" +
"<span class=\"blado\"></span>\n" +
"<span class=\"blahave\">\n" +
"This is my world</span>\n" +
"</span>\n" +
"</span>" +
"</div>";希望能帮上忙!
https://stackoverflow.com/questions/27638996
复制相似问题