Ruby gsub没有';不要跨越新线
这让我很难受。我有一个字符串,它是XHTML的一个详细片段:Ruby gsub没有';不要跨越新线,ruby,Ruby,这让我很难受。我有一个字符串,它是XHTML的一个详细片段: irb(main):012:0> input = <<-END irb(main):013:0" <p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it to be in accordance with the PA. However, the <span
irb(main):012:0> input = <<-END
irb(main):013:0" <p><span class=\"caps\">ICES</span> evaluated the management plan in 2009
and found it to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being based on lengths, excludes the problem connected with age estimation.</p>\n<p><span class=\"caps\">SSB</span>
index is estimated to have decreased by more than 20% between the periods 2010–2012
(average of the three years) and 2013–2014 (average of the two years).</p>\n<p>A candidate
multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p><pre><code><p>
The management plan, agreed October 2007 and implemented January 2008 was evaluated by
<span class=\"caps\">ICES</span> as to its accordance with the precautionary approach and
reviewed by three independent scientists.</p>\n<p>As the strong 2005 and 2006 year classes
enter the fishery discarding is expected to further increase, justifying the implementation
of measures to improve gear selectivity, such as increases in mesh size
(<span class=\"caps\">ICES</span>, 2009a).</p></code></pre>
irb(main):014:0" END
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it to
be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being based
on lengths, excludes the problem connected with age estimation.</p>\n<p><span class=\"caps\">SSB
</span> index is estimated to have decreased by more than 20% between the periods 2010–2012
(average of the three years) and 2013–2014 (average of the two years).</p>\n<p>A candidate
multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p><pre><code><p>The
management plan, agreed October 2007 and implemented January 2008 was evaluated by <span
class=\"caps\">ICES</span> as to its accordance with the precautionary approach and reviewed
by three independent scientists.</p>\n<p>As the strong 2005 and 2006 year classes enter the
fishery discarding is expected to further increase, justifying the implementation of
measures to improve gear selectivity, such as increases in mesh size (<span class=\"caps\">ICES
</span>, 2009a).</p></code></pre>\n"
irb(main):012:0>input=“ICES在2009年对管理计划进行了评估,发现该计划符合要求
与PA一致。但是,SSB指数基于
在长度方面,排除与年龄估计有关的问题。\nSSB
据估计,该指数在2010-2012年间下降了20%以上
(平均三年)和2013-2014(平均两年)。\nA候选人
对多品种FMSY进行了估算。
irb(main):015:0> input.gsub(/<pre>.*<\/pre>/,'')
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it
to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being
based on lengths, excludes the problem connected with age estimation.</p>\n<p><span
class=\"caps\">SSB</span> index is estimated to have decreased by more than 20% between the
periods 2010–2012 (average of the three years) and 2013–2014 (average of the two years).</p>\n
<p>A candidate multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p><pre>
<code><p>The management plan, agreed October 2007 and implemented January 2008 was evaluated
by <span class=\"caps\">ICES</span> as to its accordance with the precautionary approach
and reviewed by three independent scientists.</p>\n<p>As the strong 2005 and 2006 year classes
enter the fishery discarding is expected to further increase, justifying the implementation
of measures to improve gear selectivity, such as increases in mesh size (<span class=\"caps\">ICES</span>, 2009a).</p></code></pre>\n"
irb(main):016:0> input.gsub(/\n/,'').gsub(/<pre>.*<\/pre>/,'')
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it
to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being
based on lengths, excludes the problem connected with age estimation.</p><p><span
class=\"caps\">SSB</span> index is estimated to have decreased by more than 20% between the
periods 2010–2012 (average of the three years) and 2013–2014 (average of the two years).</p>
<p>A candidate multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p>"
doc.search('pre').each do |pc|
pc.remove
end
puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> </body></html>
ICES评估了2007年10月商定并于2008年1月实施的管理计划是否符合预防方法,并对其进行了审查
三位独立的科学家。\n随着2005年和2006年的课程进入
预计渔业抛弃量将进一步增加,这证明了实施
提高齿轮选择性的措施,如增加啮合尺寸(ICES
,2009a)。
\n“
现在我想去掉标记中包含的文本,但失败了:
input.gsub(/<pre>.*<\/pre>/m,'')
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<pre><code><p>foo</p></code></pre>
EOT
doc.search('pre code').each do |pc|
pc.content = ''
end
puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <pre><code></code></pre>
# >> </body></html>
irb(main):015:0>input.gsub(/.*/,“”)
=>“ICES在2009年评估了管理计划并发现了它
与PA一致。但是,SSB指数
根据长度,排除了与年龄估计相关的问题。\n估计SSB指数在这两个时期内下降了20%以上
2010-2012年(平均三年)和2013-2014年(平均两年)。\n
估计了一个候选多物种FMSY
irb(main):015:0> input.gsub(/<pre>.*<\/pre>/,'')
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it
to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being
based on lengths, excludes the problem connected with age estimation.</p>\n<p><span
class=\"caps\">SSB</span> index is estimated to have decreased by more than 20% between the
periods 2010–2012 (average of the three years) and 2013–2014 (average of the two years).</p>\n
<p>A candidate multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p><pre>
<code><p>The management plan, agreed October 2007 and implemented January 2008 was evaluated
by <span class=\"caps\">ICES</span> as to its accordance with the precautionary approach
and reviewed by three independent scientists.</p>\n<p>As the strong 2005 and 2006 year classes
enter the fishery discarding is expected to further increase, justifying the implementation
of measures to improve gear selectivity, such as increases in mesh size (<span class=\"caps\">ICES</span>, 2009a).</p></code></pre>\n"
irb(main):016:0> input.gsub(/\n/,'').gsub(/<pre>.*<\/pre>/,'')
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it
to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being
based on lengths, excludes the problem connected with age estimation.</p><p><span
class=\"caps\">SSB</span> index is estimated to have decreased by more than 20% between the
periods 2010–2012 (average of the three years) and 2013–2014 (average of the two years).</p>
<p>A candidate multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p>"
doc.search('pre').each do |pc|
pc.remove
end
puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> </body></html>
对2007年10月商定并于2008年1月实施的管理计划进行了评估
由ICES按照预防性方法执行
并由三位独立的科学家审查。\n作为2005年和2006年的重点课程
预计渔业丢弃量将进一步增加,这证明了实施的合理性
提高齿轮选择性的措施,如增加啮合尺寸(ICES,2009a)。
\n”
如果我先去掉换行符,它就会:
irb(main):016:0>input.gsub(/\n/,“”).gsub(/.*/,“”)
=>“ICES在2009年评估了管理计划并发现了它
与PA一致。但是,SSB指数
根据长度,排除了与年龄估计相关的问题。SSB指数估计在两个时期内下降了20%以上
2010-2012年(平均三年)和2013-2014年(平均两年)
估计了一个候选多物种FMSY。”
irb(main):015:0> input.gsub(/<pre>.*<\/pre>/,'')
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it
to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being
based on lengths, excludes the problem connected with age estimation.</p>\n<p><span
class=\"caps\">SSB</span> index is estimated to have decreased by more than 20% between the
periods 2010–2012 (average of the three years) and 2013–2014 (average of the two years).</p>\n
<p>A candidate multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p><pre>
<code><p>The management plan, agreed October 2007 and implemented January 2008 was evaluated
by <span class=\"caps\">ICES</span> as to its accordance with the precautionary approach
and reviewed by three independent scientists.</p>\n<p>As the strong 2005 and 2006 year classes
enter the fishery discarding is expected to further increase, justifying the implementation
of measures to improve gear selectivity, such as increases in mesh size (<span class=\"caps\">ICES</span>, 2009a).</p></code></pre>\n"
irb(main):016:0> input.gsub(/\n/,'').gsub(/<pre>.*<\/pre>/,'')
=> "<p><span class=\"caps\">ICES</span> evaluated the management plan in 2009 and found it
to be in accordance with the PA. However, the <span class=\"caps\">SSB</span> index , being
based on lengths, excludes the problem connected with age estimation.</p><p><span
class=\"caps\">SSB</span> index is estimated to have decreased by more than 20% between the
periods 2010–2012 (average of the three years) and 2013–2014 (average of the two years).</p>
<p>A candidate multispecies F<sub><span class=\"caps\">MSY</span></sub> was estimated.</p>"
doc.search('pre').each do |pc|
pc.remove
end
puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> </body></html>
我缺少什么?试试这个:
input.gsub(/.*/m',)
通知正则表达式将输入视为多行。不清楚您想要什么。是否要从
EOT
文档搜索(“预编码”)。每个都是pc|
pc.content=“”
结束
将文档放入html
# >>
# >>
# >>
# >>
这将删除内容和
标记:
除了HTML非常简单的普通用例外,您应该依赖解析器
gsub
和正则表达式将引导您沿着一条路径前进,直到HTML发生变化,代码爆炸,或者更糟糕的是,只是做了错误的事情并返回了错误的结果 你用多行修改器试过了吗?(如果你手头没有文档,那就是m
)暂时忽略HTML+Regex=3。戴夫,先生,你是个明星。谢谢。当问问题时,我们需要最少的数据来证明问题。您可以轻松地将输入减少为非常短的字符串。如果您不这样做,我们就很难回答您的问题,而且当其他人正在寻找类似的解决方案时,您也会让他们很难理解。当使用HTML或XML时,除非它是一个非常简单的字符串。相反,使用解析器;是Ruby的事实标准,在解析和修改HTML/XML方面做了很短的工作。感谢Nokogiri方法——显然是有效的,但是正则表达式工作得很好。这只是我丢失的多行标志。它将一直工作,直到您遇到一行内容,如您真正想要的内容。