首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Nokogiri &返回两个标记之间的所有数据

Nokogiri &返回两个标记之间的所有数据
EN

Stack Overflow用户
提问于 2021-08-26 02:10:53
回答 2查看 35关注 0票数 1

我正在做一个从https://platinumgod.co.uk/中抓取项的项目&我很难访问两个元素之间的所有<p>标记。

下面是HTML:

代码语言:javascript
复制
<li class="textbox" data-tid="42.5" data-cid="42" data-sid="263" style="display: inline-block;">
    <a>
        <div onclick="" class="item reb-itm-new re-itm263"></div>
        <span>
            <p class="item-title">Clear Rune</p>
            <p class="r-itemid">ItemID: 263</p>
            <p class="pickup">"Rune mimic"</p>
      <p class="quality">Quality: 2</p>
            <p>When used, copies the effect of the Rune or Soul stone you are holding (like the Blank Card)</p>
            <p>Drops a random rune on the floor when picked up</p>
            <p>The recharge time of this item depends on the Rune/Soul Stone held:</p>
            <p>1 room: Soul of Lazarus</p>
            <p>2 rooms: Rune of Ansuz, Rune of Berkano, Rune of Hagalaz, Soul of Cain</p>
            <p>3 rooms: Rune of Algiz, Blank Rune, Soul of Magdalene, Soul of Judas, Soul of ???, Soul of the Lost</p>
            <p>4 rooms: Rune of Ehwaz, Rune of Perthro, Black Rune, Soul of Isaac, Soul of Eve, Soul of Eden, Soul of the Forgotten, Soul of Jacob and Esau</p>
            <p>6 rooms: Rune of Dagaz, Soul of Samson, Soul of Azazel, Soul of Apollyon, Soul of Bethany</p>
            <p>12 rooms: Rune of Jera, Soul of Lilith, Soul of the Keeper</p>
            <ul>
                <p>Type: Active</p>
                <p>Recharge time: Varies</p>
                <p>Item Pool: Secret Room, Crane Game</p>
            </ul>
            <p class="tags">* Secret Room</p>
        </span>
    </a>
</li>

我要做的是返回<p class="quality"> (不包括这个标记)和第一个<ul>之间的所有<p>标记。

我已经尝试了论坛上找到的几个解决方案&使用我在其中一个答案中找到的以下代码只取得了部分成功(老实说,我很难理解这里发生了什么)。我迭代的原因是因为HTML中有几个项需要抓取:

代码语言:javascript
复制
items = html.at(".repentanceitems-container").css("li.textbox").each do |item|
  use = item.xpath(".//a/span/p[5]/following-sibling::p[count(.//a/span/p[6]/preceding-sibling::p)= 
        count(.//a/span/p[6]/preceding-sibling::p)]")
  end

但是,这只返回<p class="quality">之后的第一个<p>标记。我确信这可能是某种简单的原因造成的,因为我不理解代码。我还访问了我想要包含的第一个<p>元素&需要结束的<ul>,但我不确定如何确切地使用此信息:

代码语言:javascript
复制
# First line of item use
start = item.xpath('.//a/span/p[5]')
# ul tag
ending = item.xpath('.//a/span/ul[1]')

这方面的任何帮助都将不胜感激!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-08-26 02:58:19

这样如何:

代码语言:javascript
复制
require "nokogiri"

html = '<li class="textbox" data-tid="42.5" data-cid="42" data-sid="263" style="display: inline-block;"> <a> <div onclick="" class="item reb-itm-new re-itm263"></div> <span> <p class="item-title">Clear Rune</p> <p class="r-itemid">ItemID: 263</p> <p class="pickup">"Rune mimic"</p> <p class="quality">Quality: 2</p> <p>When used, copies the effect of the Rune or Soul stone you are holding (like the Blank Card)</p> <p>Drops a random rune on the floor when picked up</p> <p>The recharge time of this item depends on the Rune/Soul Stone held:</p> <p>1 room: Soul of Lazarus</p> <p>2 rooms: Rune of Ansuz, Rune of Berkano, Rune of Hagalaz, Soul of Cain</p> <p>3 rooms: Rune of Algiz, Blank Rune, Soul of Magdalene, Soul of Judas, Soul of ???, Soul of the Lost</p> <p>4 rooms: Rune of Ehwaz, Rune of Perthro, Black Rune, Soul of Isaac, Soul of Eve, Soul of Eden, Soul of the Forgotten, Soul of Jacob and Esau</p> <p>6 rooms: Rune of Dagaz, Soul of Samson, Soul of Azazel, Soul of Apollyon, Soul of Bethany</p> <p>12 rooms: Rune of Jera, Soul of Lilith, Soul of the Keeper</p> <ul> <p>Type: Active</p> <p>Recharge time: Varies</p> <p>Item Pool: Secret Room, Crane Game</p> </ul> <p class="tags">* Secret Room</p> </span> </a> </li>'
puts Nokogiri::HTML(html).css(".quality ~ p:not(.tags)")[1..].map {|e| e.text}

~语法选择当前和以后的兄弟元素,所以我使用切片跳过第一个元素。我假设.tags是在.quality之后唯一省略的另一个类;如果除了它之外还有其他元素,那么您也需要:not它们,或者在.each循环中手动检测并跳过它们,除非有人知道更聪明的技巧。

票数 1
EN

Stack Overflow用户

发布于 2021-08-30 12:47:01

你可能想看一看this draft tutorial for nokogiri.org,它解释了几种这样做的方法。

采用第三种(也是最通用的)方法,下面的代码可以满足您的需求:

代码语言:javascript
复制
class CSSSection
  def self.item_section(item)
    document = item.document
    start_tag = item.at_css("p.quality")
    end_tag = item.at_css("ul")

    # grab siblings that follow the start tag
    following_siblings_query = "#{start_tag.path}/following-sibling::*"
    following_siblings = document.xpath(following_siblings_query)

    # grab siblings that precede the end tag
    preceding_siblings_query = "#{end_tag.path}/preceding-sibling::*"
    preceding_siblings = document.xpath(preceding_siblings_query)

    following_siblings & preceding_siblings # xpath intersection
  end
end

doc = Nokogiri::HTML4(html)
li_nodes = doc.css("li") # whatever the query is to get the relevant "li" elements

data = li_nodes.map do |li_node|
  CSSSection.item_section(li_node)
end

puts data.first
# => <p>When used, copies the effect of the Rune or Soul stone you are holding (like the Blank Card)</p>
#    <p>Drops a random rune on the floor when picked up</p>
#    <p>The recharge time of this item depends on the Rune/Soul Stone held:</p>
#    <p>1 room: Soul of Lazarus</p>
#    <p>2 rooms: Rune of Ansuz, Rune of Berkano, Rune of Hagalaz, Soul of Cain</p>
#    <p>3 rooms: Rune of Algiz, Blank Rune, Soul of Magdalene, Soul of Judas, Soul of ???, Soul of the Lost</p>
#    <p>4 rooms: Rune of Ehwaz, Rune of Perthro, Black Rune, Soul of Isaac, Soul of Eve, Soul of Eden, Soul of the Forgotten, Soul of Jacob and Esau</p>
#    <p>6 rooms: Rune of Dagaz, Soul of Samson, Soul of Azazel, Soul of Apollyon, Soul of Bethany</p>
#    <p>12 rooms: Rune of Jera, Soul of Lilith, Soul of the Keeper</p>
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68931700

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档