首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Ruby不跨新行工作

Ruby不跨新行工作
EN

Stack Overflow用户
提问于 2016-01-12 18:51:11
回答 2查看 93关注 0票数 1

这是在耍我。我有一个字符串,它是XHTML的一个冗长的部分:

代码语言:javascript
复制
irb(main):012:0> input = <<-END
irb(main):013:0" <p><span class=\"caps\">ICES</span> evaluated the management plan in 2009
 and found it to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being based on lengths, excludes the problem connected with age estimation.</p>\n<p><span class=\"caps\">SSB</span> 
 index is estimated to have decreased by more than 20% between the periods 2010–2012 
 (average of the three years) and 2013–2014 (average of the two years).</p>\n<p>A candidate 
 multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p><pre><code><p>
 The management plan, agreed October 2007 and implemented January 2008 was evaluated by 
 <span class=\"caps\">ICES</span> as to its accordance with the precautionary approach and 
 reviewed by three independent scientists.</p>\n<p>As the strong 2005 and 2006 year classes 
 enter the fishery discarding is expected to further increase, justifying the implementation 
 of measures to improve gear selectivity, such as increases in mesh size 
 (<span class=\"caps\">ICES</span>, 2009a).</p></code></pre>
irb(main):014:0" END
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it to 
 be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being based 
 on lengths, excludes the problem connected with age estimation.</p>\n<p><span class=\"caps\">SSB
 </span> index is estimated to have decreased by more than 20% between the periods 2010–2012 
 (average of the three years) and 2013–2014 (average of the two years).</p>\n<p>A candidate 
 multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p><pre><code><p>The 
 management plan, agreed October 2007 and implemented January 2008 was evaluated by <span 
 class=\"caps\">ICES</span> as to its accordance with the precautionary approach and reviewed 
 by three independent scientists.</p>\n<p>As the strong 2005 and 2006 year classes enter the 
 fishery discarding is expected to further increase, justifying the implementation of 
 measures to improve gear selectivity, such as increases in mesh size (<span class=\"caps\">ICES
 </span>, 2009a).</p></code></pre>\n"

现在,我想去掉标记中包含的文本,但它失败了:

代码语言:javascript
复制
irb(main):015:0> input.gsub(/<pre>.*<\/pre>/,'')
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it
 to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being 
 based on lengths, excludes the problem connected with age estimation.</p>\n<p><span 
 class=\"caps\">SSB</span> index is estimated to have decreased by more than 20% between the 
 periods 2010–2012 (average of the three years) and 2013–2014 (average of the two years).</p>\n
 <p>A candidate multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p><pre>
 <code><p>The management plan, agreed October 2007 and implemented January 2008 was evaluated 
 by <span class=\"caps\">ICES</span> as to its accordance with the precautionary approach 
 and reviewed by three independent scientists.</p>\n<p>As the strong 2005 and 2006 year classes 
 enter the fishery discarding is expected to further increase, justifying the implementation 
 of measures to improve gear selectivity, such as increases in mesh size (<span class=\"caps\">ICES</span>, 2009a).</p></code></pre>\n"

如果我先去掉换行符,它就会:

代码语言:javascript
复制
irb(main):016:0> input.gsub(/\n/,'').gsub(/<pre>.*<\/pre>/,'')
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it 
 to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being 
 based on lengths, excludes the problem connected with age estimation.</p><p><span 
 class=\"caps\">SSB</span> index is estimated to have decreased by more than 20% between the 
 periods 2010–2012 (average of the three years) and 2013–2014 (average of the two years).</p>
 <p>A candidate multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p>"

我遗漏了什么?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2016-01-12 18:58:47

试试这个:

代码语言:javascript
复制
input.gsub(/<pre>.*<\/pre>/m,'')

switch告诉regex将输入作为多行处理。

票数 2
EN

Stack Overflow用户

发布于 2016-01-13 00:01:55

你想要什么还不清楚。是要从<pre><code>块中删除文本,还是要删除文本和包装标记?

这将从块内移除内容(文本):

代码语言:javascript
复制
require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<pre><code><p>foo</p></code></pre>
EOT

doc.search('pre code').each do |pc|
  pc.content = ''
end

puts doc.to_html 
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <pre><code></code></pre>
# >> </body></html>

这将移除内容和<code>标记:

代码语言:javascript
复制
doc.search('pre code').each do |pc|
  pc.remove
end

puts doc.to_html 

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <pre></pre>
# >> </body></html>

您可以删除<pre>标记,这也将删除<code>标记和内容:

代码语言:javascript
复制
doc.search('pre').each do |pc|
  pc.remove
end

puts doc.to_html        

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> </body></html>

除了简单的用例( HTML非常简单)外,您应该依赖一个解析器。gsub和正则表达式将引导您沿着一条路径走下去,直到HTML更改和代码爆炸,或者更糟的是,只会做错误的事情并返回糟糕的结果。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34751586

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档