blocks|key|339231|text|我还没有尝试过，但我最近读到了Feedzirra+(它声称是为性能而构建的)+:-|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|339232|339233|+Feedzirra是一个提要库，旨在尽可能快地获取和更新多个提要。这包括使用libcurl-multi通过taf2-+libxml获得更快的http+get，使用libxml通过nokogiri和sax-machine获得更快的解析。|blockquote|339234|339235|entityMap|0|LINK|mutability|MUTABLE|url|http://github.com/pauldix/feedzirra/^0|F|9|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@$A|R|B|S|1|T]]|C|$]]|$1|D|3|-4|5|6|7|U|8|@]|9|@]|C|$]]|$1|E|3|F|5|G|7|V|8|@]|9|@]|C|$]]|$1|H|3|-4|5|6|7|W|8|@]|9|@]|C|$]]|$1|I|3|-4|5|6|7|X|8|@]|9|@]|C|$]]]|J|$K|$5|L|M|N|C|$O|P]]]]

I haven't tried it, but I read about <a href="http://github.com/pauldix/feedzirra/" rel="nofollow noreferrer">Feedzirra</a> recently (it claims to be built for performance) :-

<blockquote>
 Feedzirra is a feed library that is
 designed to get and update many feeds
 as quickly as possible. This includes
 using libcurl-multi through the
 taf2-curb gem for faster http gets,
 and libxml through nokogiri and
 sax-machine for faster parsing.
</blockquote>

blocks|key|4404445|text|您可以使用RFeedParser，一个(著名的)+Python+Universal+FeedParser的Ruby-port。它是基于Hpricot的，而且非常快速和易于使用。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4404446|http://rfeedparser.rubyforge.org/|offset|length|4404447|举个例子：|4404448|require+'rubygems'
require+'rfeedparser'
require+'open-uri'

feed+=+FeedParser::parse(open('http://feeds.feedburner.com/engadget'))

feed.entries.each+do+%7Centry%7C
++puts+entry.title
end|code-block|syntax|javascript|4404449|entityMap|0|LINK|mutability|MUTABLE|url^0|0|0|X|0|0|0|0^^$0|@$1|2|3|4|5|6|7|T|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|U|8|@]|9|@$D|V|E|W|1|X]]|A|$]]|$1|F|3|G|5|6|7|Y|8|@]|9|@]|A|$]]|$1|H|3|I|5|J|7|Z|8|@]|9|@]|A|$K|L]]|$1|M|3|-4|5|6|7|10|8|@]|9|@]|A|$]]]|N|$O|$5|P|Q|R|A|$S|C]]]]

You can use RFeedParser, a Ruby-port of (famous) Python Universal FeedParser. It's based on Hpricot, and it's really fast and easy to use.

<a href="http://rfeedparser.rubyforge.org/" rel="nofollow noreferrer">http://rfeedparser.rubyforge.org/</a>

An example:

<pre><code>require 'rubygems'
require 'rfeedparser'
require 'open-uri'

feed = FeedParser::parse(open('http://feeds.feedburner.com/engadget'))

feed.entries.each do |entry|
 puts entry.title
end
</code></pre>

blocks|key|4404488|text|当你只有一把锤子时，一切看起来都像钉子。考虑使用Ruby之外的其他解决方案。虽然我喜欢Ruby和Rails，但我不会因为web开发或特定领域的语言而放弃它们，但我更喜欢用Java、Python甚至C%2B%2B来执行您所描述的那种繁重的数据提升。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4404489|由于这些解析数据的目标很可能是一个数据库，因此它可以作为解决方案的Rails部分和其他语言部分之间的共同点。然后，您使用最好的工具来解决您的每个问题，结果可能更容易处理，并真正满足您的需求。|4404490|如果速度真的很重要，为什么要在上面添加额外的约束，并说：“哦，只要我能使用Ruby，它就是最重要的。”|4404491|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|H|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|I|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|J|8|@]|9|@]|A|$]]|$1|F|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|G|$]]

When all you have is a hammer, everything looks like a nail. Consider a solution other than Ruby for this. Though I love Ruby and Rails and would not part with them for web development or perhaps for a domain specific language, I prefer heavy data lifting of the type you describe be performed in Java, or perhaps Python or even C++.

Given that the destination of this parsed data is likely a database it can act as the common point between the Rails portion of your solution and the other language portion. Then you're using the best tool to solve each of your problems and the result is likely easier to work on and truly meets your requirements.

If speed is truly of the essence, why add an additional constraint on there and say, "Oh, it's only of the essence as long as I get to use Ruby."

blocks|key|3557395|text|不确定性能如何，但在Parsing+Atom+&+RSS+in+Ruby/Rails?上回答了一个类似的问题|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|3557396|您还可以研究Hpricot，它解析XML，但假定它是格式良好的，并且不做任何验证。|3557397|http://wiki.github.com/why/hpricot+http://wiki.github.com/why/hpricot/hpricot-xml|3557398|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/questions/214590/parsing-atom-rss-in-ruby-rails|1|http://wiki.github.com/why/hpricot|2|http://wiki.github.com/why/hpricot/hpricot-xml^0|A|X|0|0|0|0|Y|1|Z|1A|2|0^^$0|@$1|2|3|4|5|6|7|T|8|@]|9|@$A|U|B|V|1|W]]|C|$]]|$1|D|3|E|5|6|7|X|8|@]|9|@]|C|$]]|$1|F|3|G|5|6|7|Y|8|@]|9|@$A|Z|B|10|1|11]|$A|12|B|13|1|14]]|C|$]]|$1|H|3|-4|5|6|7|15|8|@]|9|@]|C|$]]]|I|$J|$5|K|L|M|C|$N|O]]|P|$5|K|L|M|C|$N|Q]]|R|$5|K|L|M|C|$N|S]]]]

Not sure about the performance, but a similar question was answered at <a href="https://stackoverflow.com/questions/214590/parsing-atom-rss-in-ruby-rails">Parsing Atom &amp; RSS in Ruby/Rails?</a>

You might also look into Hpricot, which parses XML but assumes that it's well-formed and doesn't do any validation. 

<a href="http://wiki.github.com/why/hpricot" rel="nofollow noreferrer">http://wiki.github.com/why/hpricot</a>
<a href="http://wiki.github.com/why/hpricot/hpricot-xml" rel="nofollow noreferrer">http://wiki.github.com/why/hpricot/hpricot-xml</a>

blocks|key|3557548|text|最初我使用nokogiri来做一些基本的xml解析，但它很慢而且不稳定(有时)我改用feedzirra，不仅有很大的性能提升，而且没有错误，而且非常容易。示例如下所示|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|3557549|#+fetching+a+single+feed
feed+=+Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing")

#+feed+and+entries+accessors
feed.title++++++++++#+=>+"Paul+Dix+Explains+Nothing"
feed.url++++++++++++#+=>+"http://www.pauldix.net"
feed.feed_url+++++++#+=>+"http://feeds.feedburner.com/PaulDixExplainsNothing"
feed.etag+++++++++++#+=>+"GunxqnEP4NeYhrqq9TyVKTuDnh0"
feed.last_modified++#+=>+Sat+Jan+31+17:58:16+-0500+2009+#+it's+a+Time+object

entry+=+feed.entries.first
entry.title++++++#+=>+"Ruby+Http+Client+Library+Performance"
entry.url++++++++#+=>+"http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
entry.author+++++#+=>+"Paul+Dix"
entry.summary++++#+=>+"..."
entry.content++++#+=>+"..."
entry.published++#+=>+Thu+Jan+29+17:00:19+UTC+2009+#+it's+a+Time+object
entry.categories+#+=>+["...",+"..."]|code-block|syntax|javascript|3557550|如果您想对提要执行更多操作，例如解析它们，以下内容就足够了|3557551|source+=+Feedzirra::Feed.fetch_and_parse(http://www.feed-url-you-want-to-play-with.com)
++puts+"Parsing+Downloaded+XML....\n\n\n"

++source.entries.each+do+%7Centry%7C

++++begin
++++++puts+"#{entry.summary}+\n\n"
++++++cleanURL+=+(entry.url).gsub("%2B","%252B")++#my+own+sanitization+process,+ignore
++++++scrapArticleWithURL(cleanURL)
++rescue
++++puts+"(****)there+has+been+an+error+fetching+(#{entry.title})+\n\n"
++end|3557552|entityMap|0|LINK|mutability|MUTABLE|url|http://github.com/pauldix/feedzirra/^0|16|9|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@$A|V|B|W|1|X]]|C|$]]|$1|D|3|E|5|F|7|Y|8|@]|9|@]|C|$G|H]]|$1|I|3|J|5|6|7|Z|8|@]|9|@]|C|$]]|$1|K|3|L|5|F|7|10|8|@]|9|@]|C|$G|H]]|$1|M|3|-4|5|6|7|11|8|@]|9|@]|C|$]]]|N|$O|$5|P|Q|R|C|$S|T]]]]

initially i used nokogiri to do some basic xml parsing, but it was slow and erratic (at times) i switched to <a href="http://github.com/pauldix/feedzirra/" rel="nofollow">feedzirra</a> and not only was there a great performance boost, there were no errors and its as easy as pie.
Example shown below

<pre><code># fetching a single feed
feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing")

# feed and entries accessors
feed.title # =&gt; "Paul Dix Explains Nothing"
feed.url # =&gt; "http://www.pauldix.net"
feed.feed_url # =&gt; "http://feeds.feedburner.com/PaulDixExplainsNothing"
feed.etag # =&gt; "GunxqnEP4NeYhrqq9TyVKTuDnh0"
feed.last_modified # =&gt; Sat Jan 31 17:58:16 -0500 2009 # it's a Time object

entry = feed.entries.first
entry.title # =&gt; "Ruby Http Client Library Performance"
entry.url # =&gt; "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
entry.author # =&gt; "Paul Dix"
entry.summary # =&gt; "..."
entry.content # =&gt; "..."
entry.published # =&gt; Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
entry.categories # =&gt; ["...", "..."]
</code></pre>

if you want to do more with the feeds, for example parsing them, the following will suffice 

<pre><code>source = Feedzirra::Feed.fetch_and_parse(http://www.feed-url-you-want-to-play-with.com)
 puts "Parsing Downloaded XML....\n\n\n"

 source.entries.each do |entry|

 begin
 puts "#{entry.summary} \n\n"
 cleanURL = (entry.url).gsub("+","%2B") #my own sanitization process, ignore
 scrapArticleWithURL(cleanURL)
 rescue
 puts "(****)there has been an error fetching (#{entry.title}) \n\n"
 end
</code></pre>

I need to parse thousands of feeds and performance is an essential requirement. Do you have any suggestions?

Thanks in advance!

High-performance RSS/Atom parsing with Ruby on Rails

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我需要解析数以千计的提要，性能是一个基本要求。你有什么意见建议？提前感谢！

问Ruby on Rails的高性能RSS/Atom解析
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Ruby on Rails的高性能RSS/Atom解析EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Ruby on Rails的高性能RSS/Atom解析
EN