Ruby中基于智能标记分组的XML解析_Ruby_Xml_Xml Parsing

Ruby中基于智能标记分组的XML解析

ruby xml

Ruby中基于智能标记分组的XML解析,ruby,xml,xml-parsing,Ruby,Xml,Xml Parsing,这是我试图实现的转换的一个例子。源XML: <cats> <cat>John</cat> <cat>Peter</cat> </cats> 我希望结果散列中的'cats'的值是一个数组，即使源XML中只有一个因此，我希望解析器应用以下规则：如果节点xyz包含一个或多个名为xyz的子节点（以及没有其他节点），则节点xyz应在结果哈希中表示为一个数组，名称为xyz（数组的每个元素应为相应xyz元素的内容）以下

这是我试图实现的转换的一个例子。源XML:

<cats>
  <cat>John</cat>
  <cat>Peter</cat>
</cats>

我希望结果散列中的

'cats'

的值是一个数组，即使源XML中只有一个

因此，我希望解析器应用以下规则：

如果节点

xyz

包含一个或多个名为

xyz

的子节点（以及没有其他节点），则节点

xyz

应在结果哈希中表示为一个数组，名称为

xyz

（数组的每个元素应为相应

xyz

元素的内容）

以下是使用lib实现的方法：

但是，我必须输入目标元素的所有名称，而且似乎没有其他方法在XmlSimple中定义forcearray/grouptags行为

破解一个预处理例程并不难，该例程将提取所有名称，然后将它们传递到方法中的xml_中，但可能有更优雅（即，已经编写）的方法来实现这一点

（如果其他XML解析库能够进行转换，我很乐意使用它）

UPD：如果有问题，我的最终目标是将结果哈希保存到MongoDB中（即，整体转换为XML->BSON）

UPD2：同样，我不想指定应该作为数组处理的元素的名称，我希望lib为我发挥作用。

使用Nokogiri，我们可以编写以下代码：

require 'inflector'
require 'nokogiri'

def get_xml_stuff(xml, singular)
  plural = Inflector.pluralize(singular)
  return_hash = {plural => []}
  xml.xpath("*/#{plural}/#{singular}").each { |tag| return_hash[plural] << tag.text}
  return return_hash
end

但是，如果相同的复数形式在文件中出现不止一次，这仍然不是完美的

回答UPD2。我的新功能版本如下：

def get_xml_stuff(xml, plural)
  singular = Inflector.singularize(plural)
  return_hash = {plural => []}
  path = xml.xpath("./#{singular}")
  path.each { |tag| return_hash[plural] << tag.text} unless path.size != xml.xpath("./*").size
  return return_hash
end

如果没有像我的示例

那样的标记，则以下操作应该有效：

sample_xml = Nokogiri::XML(sample_xml_text)
array = get_xml_stuff(sample_xml.children.first, sample_xml.children.first.name)
p array

结束UPD2

作为参考，我的测试是：

sample_xml_text = <<-sample
<pets>
  <cats>
    <cat>John</cat>
    <cat>Peter</cat>
  </cats>
  <kitties>
    <kitty>Tibbles</kitty>
    <kitty>Meow-chan</kitty>
    <kitty>Puss</kitty>
  </kitties>
  <giraffes>
    <giraffe>Long Neck</giraffe>
  </giraffes>
  <dogs>
    <dog>Rover</dog>
    <dog>Spot</dog>
    <cat>Peter</cat>
  </dogs>
</pets>
sample

sample_xml = Nokogiri::XML(sample_xml_text)
array = get_xml_stuff(sample_xml, "cat")
p array
array = get_xml_stuff(sample_xml, "kitty")
p array
array = get_xml_stuff(sample_xml, "giraffe")
p array
array = get_xml_stuff(sample_xml, "dog")
p array

sample\u xml\u text=首先查找以s
结尾的元素名称：
names = doc.search('*[name()$="s"]').map(&:name).uniq
#=> ["cats"]

剩下的只是映射和散列：
Hash[names.map{|name| [name, doc.search("#{name} > #{name.sub /s$/, ''}").map(&:text)]}]
#=> {"cats"=>["John", "Peter"]}

听起来像是一个家庭作业。没有太多以“查找一个库…”开头的作业。请参阅UPD2。我不想指定元素名称。太好了！但是我可以把每个元素的内容也作为散列来获取吗？i、 例如，如何将John10
转换为{'cats'=>[{'name'=>'John'，'age'=>10}}}
我给你一个提示，你可以将（&:text）
更改为{x | Hash[x.search（'>*'）.map（其他内容）]
sample_xml = Nokogiri::XML(sample_xml_text)
array = get_xml_stuff(sample_xml.children.first, sample_xml.children.first.name)
p array

sample_xml_text = <<-sample
<pets>
  <cats>
    <cat>John</cat>
    <cat>Peter</cat>
  </cats>
  <kitties>
    <kitty>Tibbles</kitty>
    <kitty>Meow-chan</kitty>
    <kitty>Puss</kitty>
  </kitties>
  <giraffes>
    <giraffe>Long Neck</giraffe>
  </giraffes>
  <dogs>
    <dog>Rover</dog>
    <dog>Spot</dog>
    <cat>Peter</cat>
  </dogs>
</pets>
sample

sample_xml = Nokogiri::XML(sample_xml_text)
array = get_xml_stuff(sample_xml, "cat")
p array
array = get_xml_stuff(sample_xml, "kitty")
p array
array = get_xml_stuff(sample_xml, "giraffe")
p array
array = get_xml_stuff(sample_xml, "dog")
p array

names = doc.search('*[name()$="s"]').map(&:name).uniq
#=> ["cats"]

Hash[names.map{|name| [name, doc.search("#{name} > #{name.sub /s$/, ''}").map(&:text)]}]
#=> {"cats"=>["John", "Peter"]}