我试图从http://virt10.itu.chalmers.se/index.php/Guard获取所有链接(更精确的链接文本),只要它们在标题下的“关系”,“可以实例化”。
<h2><span class="mw-headline" id="Relations">Relations</span></h2>
<h3><span class="mw-headline" id="Can_Instantiate">Can Instantiate</span></h3>
<p><a href="/index.php/Attention_Demanding_Gameplay" title="Attention Demanding Gameplay">Attention Demanding Gameplay</a>,
<a href="/index.php?title=Conflicts&action=edit&redlink=1" class="new" title="Conflicts (page does not exist)">Conflicts</a>,
<a href="/index.php/Continuous_Goals" title="Continuous Goals">Continuous Goals</a>,
<a href="/index.php?title=Ownership&action=edit&redlink=1" class="new" title="Ownership (page does not exist)">Ownership</a>,
<a href="/index.php/Preventing_Goals" title="Preventing Goals">Preventing Goals</a>,
<a href="/index.php/Reconnaissance" title="Reconnaissance">Reconnaissance</a>,
<a href="/index.php/Stimulated_Planning" title="Stimulated Planning">Stimulated Planning</a>,
<a href="/index.php/Trade-Offs" title="Trade-Offs">Trade-Offs</a>
</p>不幸的是,我有点不理解jsoup (或java)。到目前为止,我已经尝试过这样的东西
Elements contentinstantiate = doc.select("span.mw-headline, h3 ~ a");
for (int i=0; i < contentinstantiate.size(); i++) {
System.out.println(contentinstantiate.get(i).text());
}或
Elements links = content.getElementsByTag("a");
for (Element link : links) {
String linkHref = link.attr("title");
System.out.println(linkHref);
String linkText = link.text();
System.out.println(linkText);
}但两者都不起作用,我在这里有点不知所措。有谁可以帮我?
发布于 2018-06-17 21:55:29
您选择了h3,而h3没有任何标签。我想你必须选择p标签。
发布于 2018-06-17 21:58:05
Jsoup下一个兄弟API将会帮助你。您需要选择
<h2><span class="mw-headline" id="Relations">Relations</span></h2> 然后您需要迭代到下一个元素,它将是h3。
<h3><span class="mw-headline" id="Can_Instantiate">Can Instantiate</span></h3> 然后,您将转到下一个兄弟,并检查段落节点
一旦选择了段落节点,提取链接就很容易了
例如,H2的下一个同级节点是H3,H3的下一个同级节点是<p>
每个Jsoup节点都有下面的方法
public Node nextSibling(),它将把它迭代到下一个同级
你可以在Jsoup doc上阅读它。
发布于 2018-06-17 22:40:43
试试这个,这是我的工作代码:
final String url = "http://virt10.itu.chalmers.se/index.php/Guard";
Document doc = Jsoup.connect(url).get();
Element contentinstantiate = doc.getElementById("Can_Instantiate").parent().nextElementSibling();
for(Element e : contentinstantiate.getAllElements()){
System.out.println(e.text());
}输出:
Attention Demanding Gameplay, Conflicts, Continuous Goals, Ownership, Preventing Goals, Reconnaissance, Stimulated Planning, Trade-Offs
Attention Demanding Gameplay
Conflicts
Continuous Goals
Ownership
Preventing Goals
Reconnaissance
Stimulated Planning
Trade-Offshttps://stackoverflow.com/questions/50897294
复制相似问题