首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Nokogiri解析XML

使用Nokogiri解析XML
EN

Stack Overflow用户
提问于 2013-01-01 09:05:33
回答 2查看 11.1K关注 0票数 3

在为Nokogiri和他们的文档进行正确的设置时遇到了一些问题,开始使用起来有点麻烦。

我正在尝试解析XML文件:http://www.kongregate.com/games_for_your_site.xml

它返回游戏集中的多个游戏,并且对于每个游戏,它都有一个标题、描述等。

代码语言:javascript
复制
<gameset>
  <game>
    <id>160342</id>
    <title>Tricky Rick</title>
    <thumbnail>
      http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op
    </thumbnail>
    <launch_date>2012-12-12</launch_date>
    <category>Puzzle</category>
    <flash_file>
      http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf
    </flash_file>
    <width>640</width>
    <height>480</height>
    <url>
      http://www.kongregate.com/games/tAMAS_Games/tricky-rick
    </url>
    <description>
      Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!
    </description>
    <instructions>
      WASD \ Arrow Keys &#8211; move; S \ Down Arrow &#8211; take\release an object; CNTRL &#8211; interaction with objects: throw, hammer strike, invisibility mode; SPACE &#8211; interaction with elevators and fuel stations; Esc \ P &#8211; pause;
    </instructions>
    <developer_name>tAMAS_Games</developer_name>
    <gameplays>24999</gameplays>
    <rating>3.43</rating>
  </game>
  <game>
    <id>160758</id>
    <title>Flying Cookie Quest</title>
    <thumbnail>
      http://cdn2.kongregate.com/game_icons/0042/8428/icon_cookiequest_kong_250x200_site.png?16578-op
    </thumbnail>
    <launch_date>2012-12-07</launch_date>
    <category>Action</category>
    <flash_file>
      http://external.kongregate-games.com/gamez/0016/0758/live/embeddable_160758.swf
    </flash_file>
    <width>640</width>
    <height>480</height>
    <url>
      http://www.kongregate.com/games/LongAnimals/flying-cookie-quest
    </url>
    <description>
      Launch Rocket Panda into the land of Cookies. With the help of low-flying sharks, hang-gliding sheep and Rocket Badger, can you defeat the all powerful Biscuit Head? Defeat All enemies of cookies in this launcher game.
    </description>
    <instructions>Use the mouse button!</instructions>
    <developer_name>LongAnimals</developer_name>
    <gameplays>168672</gameplays>
    <rating>3.67</rating>
  </game>

在文档中,我使用了如下内容:

代码语言:javascript
复制
require 'nokogiri'
require 'open-uri'

url = "http://www.kongregate.com/games_for_your_site.xml"
xml = Nokogiri::XML(open(url))
xml.xpath("//game").each do |node|
    puts node.xpath("//id")
    puts node.xpath("//title")
    puts node.xpath("//thumbnail")
    puts node.xpath("//category")
    puts node.xpath("//flash_file")
    puts node.xpath("//width")
    puts node.xpath("//height")
    puts node.xpath("//description")
    puts node.xpath("//instructions")
end

但是,它只返回无穷无尽的数据,而不是一组数据。任何帮助都是有帮助的。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2013-01-01 15:29:06

下面是我如何重写你的代码:

代码语言:javascript
复制
xml = Nokogiri::XML(open("http://www.kongregate.com/games_for_your_site.xml"))
xml.xpath("//game").each do |game|
  %w[id title thumbnail category flash_file width height description instructions].each do |n|
    puts game.at(n)
  end
end

代码中的问题是所有子标记都以//为前缀,这在XPath-ese中意味着“从根节点开始,向下搜索包含该文本的所有标记”。因此,它不是只在每个//game节点内搜索,而是在整个文档中搜索每个//game节点的每个列出的标记。

我推荐使用CSS存取器而不是XPath,因为它们(通常)更简单,也更容易阅读。因此,我使用search('game')而不是xpath('//game')。(search将接受CSS或XPath访问器,at也是如此。)

如果要将文本包含在标记中,请将puts game.at(n)更改为:

代码语言:javascript
复制
puts game.at(n).text

为了使输出更有用,我会这样做:

代码语言:javascript
复制
require 'nokogiri'
require 'open-uri'

xml = Nokogiri::XML(open('http://www.kongregate.com/games_for_your_site.xml'))
games = xml.search('game').map do |game|
  %w[
    id title thumbnail category flash_file width height description instructions
  ].each_with_object({}) do |n, o|
    o[n] = game.at(n).text
  end
end

require 'awesome_print'
puts games.size
ap games.first
ap games.last

这会导致:

代码语言:javascript
复制
395
{
              "id" => "160342",
          "title"  => "Tricky Rick",
      "thumbnail"  => "http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op",
        "category" => "Puzzle",
      "flash_file" => "http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf",
          "width"  => "640",
          "height" => "480",
    "description"  => "Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!\n",
    "instructions" => "WASD \\ Arrow Keys &#8211; move;\nS \\ Down Arrow &#8211; take\\release an object;\nCNTRL &#8211; interaction with objects: throw, hammer strike, invisibility mode;\nSPACE &#8211; interaction with elevators and fuel stations;\nEsc \\ P &#8211; pause;\n"
}
{
              "id" => "78",
          "title"  => "rotaZion",
      "thumbnail"  => "http://cdn2.kongregate.com/game_icons/0000/0115/pixtiz.rotazion_icon.jpg?8217-op",
        "category" => "Action",
      "flash_file" => "http://external.kongregate-games.com/gamez/0000/0078/live/embeddable_78.swf",
          "width"  => "350",
          "height" => "350",
    "description"  => "In rotaZion, you play with a bubble bar that you can&#8217;t stop rotating !\nCollect the bubbles and try to avoid the mines !\nCollect the different bonus to protect your bubble bar, makes the mines go slower or destroy all the mines !\nTry to beat 100.000 points ;)\n",
    "instructions" => "Move the bubble bar with the arrow keys !\nBubble = 500 Points !\nPixtiz sign = 5000 Points !\n"
}
票数 20
EN

Stack Overflow用户

发布于 2013-01-01 10:50:35

你可以试试这样的东西。我建议为游戏中你想要的元素创建一个数组,然后迭代它们。我确信有一种方法可以在Nokogiri中获取指定元素中的所有元素,但这是可行的:

代码语言:javascript
复制
   xml = Nokogiri::XML(result)
    xml.css("game").each do |inv|
      inv.css("title").each do |f|  # title or whatever else you want
        puts f.inner_html
      end
    end
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/14107178

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档