Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 如何仅向Nokogiri馈送字符串_Ruby_Nokogiri - Fatal编程技术网

Ruby 如何仅向Nokogiri馈送字符串

Ruby 如何仅向Nokogiri馈送字符串,ruby,nokogiri,Ruby,Nokogiri,我有以下示例XML: <all> <houses> <reg info='<root><h level="2" i="1"> something </h><root>' other="test" something </reg> </houses> </all> 如何执行此操作?您

我有以下示例XML:

<all>
    <houses>
        <reg info='<root><h level="2" i="1"> something </h><root>' 
             other="test"
             something
        </reg>
    </houses>
</all>

如何执行此操作?

您需要获取info属性的文本,并使用GCI类来取消HTML的显示。然后,您可以将字符串馈送到
Nokogiri::HTML
,它将被解析。像这样的

require "nokogiri"
require "open-uri"
require "cgi"

doc = Nokogiri::HTML(open-uri("http://example.com/foo.xml"))
node = doc.xpath("//houses/reg")
info_string = CGI.unescapeHTML(node[0]['info'])
info_doc = Nokogiri::XML(info_string)
# Now you can have a Nokogiri document from that attribute.
需要“nokogiri”
xml=”

节点[0]。attr('info')
为您提供info属性的值

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<all>
  <houses>
    <reg info='<root><h level="2" i="1"> something </h><root>' 
          other="test"
          something
    </reg>
  </houses>
</all>
EOT

doc.errors  # => [#<Nokogiri::XML::SyntaxError: Unescaped '<' not allowed in attributes values>, #<Nokogiri::XML::SyntaxError: attributes construct error>, #<Nokogiri::XML::SyntaxError: Couldn't find end of Start Tag reg line 3>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: root line 3 and reg>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: root line 3 and houses>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: houses line 2 and all>, #<Nokogiri::XML::SyntaxError: Premature end of data in tag all line 1>]
doc.at('reg')['info'] # => ""
puts doc.to_xml

# >> <?xml version="1.0"?>
# >> <all>
# >>   <houses>
# >>     <reg info=""/><root><h level="2" i="1"> something </h><root>' 
# >>           other="test"
# >>           something
# >>     </root>
# >>   </root>
# >> </houses>
# >> </all>
require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<all>
  <houses>
    <reg info='<root><h level="2" i="1"> something </h><root>' 
          other="test"
          something
    </reg>
  </houses>
</all>
EOT

doc.errors  # => [#<Nokogiri::XML::SyntaxError: Unescaped '<' not allowed in attributes values>, #<Nokogiri::XML::SyntaxError: attributes construct error>, #<Nokogiri::XML::SyntaxError: Couldn't find end of Start Tag reg line 3>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: root line 3 and reg>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: root line 3 and houses>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: houses line 2 and all>, #<Nokogiri::XML::SyntaxError: Premature end of data in tag all line 1>]
doc.at('reg')['info'] # => ""
puts doc.to_xml

# >> <?xml version="1.0"?>
# >> <all>
# >>   <houses>
# >>     <reg info=""/><root><h level="2" i="1"> something </h><root>' 
# >>           other="test"
# >>           something
# >>     </root>
# >>   </root>
# >> </houses>
# >> </all>
注意Nokogiri现在是如何:

  • 正确的HTML编码了
    info
  • 正确提取并解码
    info
    的内容

我不确定Nokogiri的行为是否在最初提出问题后发生了变化,但v.1.6.7.2中的当前行为在不需要使用CGI的情况下正确处理解码过程。

以下是一些需要注意的事项:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<all>
  <houses>
    <reg info='<root><h level="2" i="1"> something </h><root>' 
          other="test"
          something
    </reg>
  </houses>
</all>
EOT

doc.errors  # => [#<Nokogiri::XML::SyntaxError: Unescaped '<' not allowed in attributes values>, #<Nokogiri::XML::SyntaxError: attributes construct error>, #<Nokogiri::XML::SyntaxError: Couldn't find end of Start Tag reg line 3>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: root line 3 and reg>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: root line 3 and houses>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: houses line 2 and all>, #<Nokogiri::XML::SyntaxError: Premature end of data in tag all line 1>]
doc.at('reg')['info'] # => ""
puts doc.to_xml

# >> <?xml version="1.0"?>
# >> <all>
# >>   <houses>
# >>     <reg info=""/><root><h level="2" i="1"> something </h><root>' 
# >>           other="test"
# >>           something
# >>     </root>
# >>   </root>
# >> </houses>
# >> </all>
require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<all>
  <houses>
    <reg info='<root><h level="2" i="1"> something </h><root>' 
          other="test"
          something
    </reg>
  </houses>
</all>
EOT

doc.errors  # => [#<Nokogiri::XML::SyntaxError: Unescaped '<' not allowed in attributes values>, #<Nokogiri::XML::SyntaxError: attributes construct error>, #<Nokogiri::XML::SyntaxError: Couldn't find end of Start Tag reg line 3>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: root line 3 and reg>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: root line 3 and houses>, #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: houses line 2 and all>, #<Nokogiri::XML::SyntaxError: Premature end of data in tag all line 1>]
doc.at('reg')['info'] # => ""
puts doc.to_xml

# >> <?xml version="1.0"?>
# >> <all>
# >>   <houses>
# >>     <reg info=""/><root><h level="2" i="1"> something </h><root>' 
# >>           other="test"
# >>           something
# >>     </root>
# >>   </root>
# >> </houses>
# >> </all>
注意Nokogiri现在是如何:

  • 正确的HTML编码了
    info
  • 正确提取并解码
    info
    的内容
  • 由于将内容解析为HTML,已将XML包装在HTML
    标记中
要提取固定XML,需要剥离两层:

puts doc.at('all').to_xml

# >> <all>
# >>   <houses>
# >>     <reg info="&lt;root&gt;&lt;h level=&quot;2&quot; i=&quot;1&quot;&gt; something &lt;/h&gt;&lt;root&gt;" other="test" something="">
# >>   </reg></houses>
# >> </all>
将doc.at('all')放在xml中
# >> 
# >>   
# >>     
# >>   
# >> 

我不确定Nokogiri的行为是否在最初提问后发生了变化,但v.1.6.7.2中的当前行为在不需要使用CGI的情况下正确处理解码。

如果不进行解析,代码将无法执行任何操作,错误的解析将导致
info
属性被破坏。您的答案需要说明如何进行修改一个可行的解决方案。通常使用HTML解析器解析XML是错误的,但在这种情况下它会有所帮助,因为HTML解析更为宽松。请参阅我的答案,了解Nokogiri当前的行为。
doc = Nokogiri::HTML(<<EOT)
<all>
  <houses>
    <reg info='<root><h level="2" i="1"> something </h><root>' 
          other="test"
          something
    </reg>
  </houses>
</all>
EOT

doc.errors  # => [#<Nokogiri::XML::SyntaxError: Tag all invalid>, #<Nokogiri::XML::SyntaxError: Tag houses invalid>, #<Nokogiri::XML::SyntaxError: error parsing attribute name>, #<Nokogiri::XML::SyntaxError: Tag reg invalid>]
doc.at('reg')['info'] # => "<root><h level=\"2\" i=\"1\"> something </h><root>"
puts doc.to_xml


# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <all>
# >>   <houses>
# >>     <reg info='&lt;root&gt;&lt;h level="2" i="1"&gt; something &lt;/h&gt;&lt;root&gt;' other="test" something>
# >>   </reg></houses>
# >> </all>
# >> </body></html>
puts doc.at('all').to_xml

# >> <all>
# >>   <houses>
# >>     <reg info="&lt;root&gt;&lt;h level=&quot;2&quot; i=&quot;1&quot;&gt; something &lt;/h&gt;&lt;root&gt;" other="test" something="">
# >>   </reg></houses>
# >> </all>