Parsing Hpricot搜索一个特定命名空间下的所有标记_Parsing_Xhtml_Jruby_Xml Namespaces_Hpricot

Parsing Hpricot搜索一个特定命名空间下的所有标记

parsing

Parsing Hpricot搜索一个特定命名空间下的所有标记,parsing,xhtml,jruby,xml-namespaces,hpricot,Parsing,Xhtml,Jruby,Xml Namespaces,Hpricot,例如，我有以下代码： <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title><io:content part="title" /></title> <link rel="icon" href="/document/7e9f29e2-cdee-4f85-ba25-132fa867aa90/latest" ty

例如，我有以下代码：

<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title><io:content part="title" /></title>
  <link rel="icon" href="/document/7e9f29e2-cdee-4f85-ba25-132fa867aa90/latest" type="image/x-icon" />
  <n1:content description="Standard CSS" uuid="d069071c-3534-4945-9fb6-2d7be35a165e" />
  <n1:term>Content Development</n1:term>
</head>


内容开发

这个XHTML片段并不严格合法，因为之前没有声明名称空间，所以我不能使用Nokogiri，它有更好的名称空间支持

我想进行一次搜索，可以找到节点

和

以及“n1”命名空间下的所有标记

如何做到这一点？谢谢

看起来Hpricot无法完全处理名称空间

如果知道元素，则可以选择，而不考虑前缀：

doc.search("title")
=> #<Hpricot::Elements[{elem <title> {emptyelem <io:content part="title">} </title>}]>

文档搜索（“标题”）
=> #

。。。但这不是你要的

以下是我的破解方法：首先使用regex查找所有名称空间元素，然后使用Hpricot搜索这些元素：

elems = doc.to_s.scan(/<\s*(n1:\w+)/).uniq.join("|")
=> "n1:content|n1:term"
doc.search(elems)
=> #<Hpricot::Elements[{emptyelem <n1:content description="Standard CSS" uuid="d069071c-3534-4945-9fb6-2d7be35a165e">}, {elem <n1:term> "Content Development" </n1:term>}]>

elems=doc.to_.s.扫描（/“n1:content | n1:term”
文件搜索（elems）
=> #

看起来Hpricot无法完全处理名称空间

如果知道元素，则可以选择，而不考虑前缀：

doc.search("title")
=> #<Hpricot::Elements[{elem <title> {emptyelem <io:content part="title">} </title>}]>

文档搜索（“标题”）
=> #

……但这不是你要的

以下是我的破解方法：首先使用regex查找所有名称空间元素，然后使用Hpricot搜索这些元素：

elems = doc.to_s.scan(/<\s*(n1:\w+)/).uniq.join("|")
=> "n1:content|n1:term"
doc.search(elems)
=> #<Hpricot::Elements[{emptyelem <n1:content description="Standard CSS" uuid="d069071c-3534-4945-9fb6-2d7be35a165e">}, {elem <n1:term> "Content Development" </n1:term>}]>

elems=doc.to_.s.扫描（/“n1:content | n1:term”
文件搜索（elems）
=> #

谢谢。我通过以下方法解决了问题：遍历\u元素（*@io\u标记）谢谢。我通过以下方法解决了问题：遍历\u元素（*@io\u标记）