我想要做的事情对我来说似乎很简单,但我的努力远远超过了我应该做的。我有一个文档,其中包含以下内容:
<h2>First Heading</h2>
<table>
<div class="title">First Subheading One</div>
<div class="title">First Subheading Two</div>
<div class="title">First Subheading Three</div>
</table>
<h2>Second Heading</h2>
<table>
<div class="title">Second Subheading One</div>
<div class="title">Second Subheading Two</div>
<div class="title">Second Subheading Three</div>
</table>
<h2>Third Heading</h2>
<table>
<div class="title">Third Subheading One</div>
<div class="title">Third Subheading Two</div>
<div class="title">Third Subheading Three</div>
</table>不出所料,使用doc.select("h2")可以得到所有标题。使用doc.select("div.title")给出了所有的副标题,也不出所料。我要做的就是遍历返回的h2元素,并在其中遍历,然后遍历返回的div.title元素--我尝试过很多方法,我对编程一点也不陌生(不过,对jsoup来说是个新手),但我似乎就是想不通如何做到这一点。
Headings = httpDoc.select("h3");
for(Element Headings : heading) {
// something with heading.nextSibling here
}如果有什么我能做的(例如。nextSibling),这给了我节点?然后,我可以执行另一个select("div.title"),然后遍历这些select来获取副标题?
还是说我完全走错了路?如果这看起来很愚蠢,我很抱歉-我觉得有点愚蠢,因为我已经编写了很多年的代码,但从来没有处理过DOM (一直都是Win32的人)。
发布于 2012-03-01 18:01:00
我的理解是!
我从您的问题中理解到的是,您正在尝试获取h2标记,然后对于每个heading <h2>,您将尝试在表中获取相应的div.title。
你的错误
在提供的代码片段中,你试图得到h3而不是h2,,这是你的超文本标记语言code.
<table>应该有一个<tr> & <td> (我认为<td>是可选的,请查看W3页面)。因此,在解析HTML片段时,只需对格式错误的<table>进行prunes/removes jSoup即可。
根据我对您的问题的理解,预期输出!!
The header is: First Heading
The div content is: First Subheading One
The div content is: First Subheading Two
The div content is: First Subheading Three
========== +_+ ===========
The header is: Second Heading
The div content is: Second Subheading One
The div content is: Second Subheading Two
The div content is: Second Subheading Three
========== +_+ ===========
The header is: Third Heading
The div content is: Third Subheading One
The div content is: Third Subheading Two
The div content is: Third Subheading Three
========== +_+ ===========以上输出的代码!!
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JSoupTest
{
public static void main(String[] args)
{
String s = "<h2>First Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>First Subheading One</div>";
s += "<div class='title'>First Subheading Two</div>";
s += "<div class='title'>First Subheading Three</div>";
s += "</table>";
s += "<h2>Second Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>Second Subheading One</div>";
s += "<div class='title'>Second Subheading Two</div>";
s += "<div class='title'>Second Subheading Three</div>";
s += "</td></tr></table>";
s += "<h2>Third Heading</h2>";
s += "<table><tr><td>";
s += "<div class='title'>Third Subheading One</div>";
s += "<div class='title'>Third Subheading Two</div>";
s += "<div class='title'>Third Subheading Three</div>";
s += "</td></tr></table>";
Document doc = Jsoup.parse(s);
Elements h_2 = doc.select("h2");
for(int i=0; i<h_2.size(); i++)
{
Element e = h_2.get(i);
System.out.println("The header is: " + e.ownText());
Element nextSib = e.nextElementSibling();
Elements divs = nextSib.select("div.title");
for(int j=0; j<divs.size(); j++)
{
Element d = divs.get(j);
System.out.println("The div content is: " + d.ownText());
}
System.out.println("========== +_+ ===========");
}
}
}https://stackoverflow.com/questions/9510113
复制相似问题