Ruby 如何有条件地检查和提取XML元素_Ruby_Xml_Regex

Ruby 如何有条件地检查和提取XML元素

ruby xml regex

Ruby 如何有条件地检查和提取XML元素,ruby,xml,regex,Ruby,Xml,Regex,我必须解析如下所示的给定XML文件： <country id='cid-cia-Ashmore-and-Cartier-Islands' continent='Asia' name='Ashmore and Cartier Islands' datacode='AT' total_area='5' government='territory of Australia administered by the Australian Ministry for the Env

我必须解析如下所示的给定XML文件：

<country id='cid-cia-Ashmore-and-Cartier-Islands' 
  continent='Asia'
  name='Ashmore and Cartier Islands'
  datacode='AT'
  total_area='5'
  government='territory of Australia administered by the Australian Ministry for the Environment'>
  <coasts>Indian Ocean</coasts>
</country>

<country id='cid-cia-Azerbaijan' 
  continent='Asia'
  name='Azerbaijan'
  datacode='AJ'
  total_area='86600'
  population='7676953'
  population_growth='0.78'
  infant_mortality='74.5'
  inflation='85'
  gdp_total='11500'
  indep_date='30 08 1991'
  government='republic'
  capital='Baku'>
  <ethnicgroups name='Russian'>2.5</ethnicgroups>
  <ethnicgroups name='Armenian'>2.3</ethnicgroups>
  <ethnicgroups name='Azeri'>90</ethnicgroups>
  <ethnicgroups name='Dagestani Peoples'>3.2</ethnicgroups>
  <religions name='Muslim'>93.4</religions>
  <religions name='Armenian Orthodox'>2.3</religions>
  <religions name='Russian Orthodox'>2.5</religions>
  <languages name='Russian'>3</languages>
  <languages name='Armenian'>2</languages>
  <languages name='Azeri'>89</languages>
  <borders country='cid-cia-Armenia'>787</borders>
  <borders country='cid-cia-Georgia'>322</borders>
  <borders country='cid-cia-Iran'>611</borders>
  <borders country='cid-cia-Russia'>284</borders>
  <borders country='cid-cia-Turkey'>9</borders>
  <coasts>Caspian Sea</coasts>
</country>

<country id='cid-cia-Bahrain' 
  continent='Asia'
  name='Bahrain'
  datacode='BA'
  total_area='620'
  population='590042'
  population_growth='2.27'
  infant_mortality='17.1'
  inflation='3'
  gdp_total='7300'
  indep_date='15 08 1971'
  government='traditional monarchy'
  capital='Manama'>
  <ethnicgroups name='Arab'>10</ethnicgroups>
  <ethnicgroups name='Asian'>13</ethnicgroups>
  <ethnicgroups name='Bahraini'>63</ethnicgroups>
  <ethnicgroups name='Iranian'>8</ethnicgroups>
  <religions name='Sunni Muslim'>25</religions>
  <religions name='Shia Muslim'>75</religions>
  <coasts>Persian Gulf</coasts>
</country>


印度洋
2.5
2.3
90
3.2
93.4
2.3
2.5
3.
2.
89
787
322
611
284
9
里海
10
13
63
8.
25
75
波斯湾

我必须用XML解析它，以获取

名称

和

通货膨胀

值，仅当存在与给定国家相关联的通货膨胀值时

我在这里有一个红细胞设置：随着我的进步。我让它返回两场比赛，这很好，但是如果你仔细看第一场比赛，这个国家是阿什莫尔和卡地亚群岛，然后看那个国家的XML，没有通货膨胀-正则表达式只是不断下降，直到找到通货膨胀值，然后它关闭它

我想知道是否有一种方法，我可以有一些条件操作，检查是否有通货膨胀的关键点，如果有，抓取名称值和通货膨胀值

提前谢谢

不要对XML使用正则表达式。相反，请使用类似的引擎。

不要对XML使用正则表达式。取而代之的是，使用一个引擎，如。

您确实可以使用Nokogiri，例如：

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::XML(open('./country.xml'))
doc.xpath('//country[@inflation]/@name|//country/@inflation').each do |res|
puts res
end

如果您“需要”使用正则表达式，则此正则表达式应完成以下工作：

<country [^>]*? name='(?<name>[^']+)'[^>]*? inflation='(?<inflation>[^']+)'

]*？名称='（？[^']+）'[^>]*？通货膨胀='（？[^']+）'

您确实可以使用Nokogiri，例如：

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::XML(open('./country.xml'))
doc.xpath('//country[@inflation]/@name|//country/@inflation').each do |res|
puts res
end

如果您“需要”使用正则表达式，则此正则表达式应完成以下工作：

<country [^>]*? name='(?<name>[^']+)'[^>]*? inflation='(?<inflation>[^']+)'

]*？名称='（？[^']+）'[^>]*？通货膨胀='（？[^']+）'

Ruby标准库包括XML解析器。

查找nokogiri gem:。我发现ox比nokogiri快5倍，因此这里有一个例子是ox-，搜索任何元素并以哈希形式获取它Look for nokogiri gem:。我发现ox比nokogiri快5倍，因此这里有一个ox中的示例-，搜索任何元素并在哈希中获取它form@jScott你必须使用正则表达式吗？“为什么？”马特问我的任务。我成功地使用了像nokogiri和xmlsimple这样的库，但有人告诉我使用正则表达式。@jScott你必须使用正则表达式吗？“为什么？”马特问我的任务。我成功地使用了像nokogiri和xmlsimple这样的库，但有人告诉我要使用正则表达式。对于答案中的正则表达式部分：即，如果

inflation

属性始终跟在

name

属性之后，则属性总是向下排列，并且值包含在单个qout中。这些都是反对使用正则表达式的理由，我一时兴起：-）@Casimir，我拿了你的正则表达式，不得不对它做一些修改，但这就是我想到的，它起了作用：

]*？名称='（[^']+）'[^>]*？通货膨胀='（[^']+）'

Rubular:谢谢大家！对于答案的正则表达式部分：也就是说，如果

inflation

属性始终跟在

name

属性后面，那么属性总是向下排列，并且值包含在单个qout中。这些都是反对使用正则表达式的理由，我一时兴起：-）@Casimir，我拿了你的正则表达式，不得不对它做一些修改，但这就是我想到的，它起了作用：

]*？名称='（[^']+）'[^>]*？通货膨胀='（[^']+）'

Rubular:谢谢大家！