在python中不使用换行符解析XML
这是xml文件 我的python代码:在python中不使用换行符解析XML,python,xml,parsing,xpath,Python,Xml,Parsing,Xpath,这是xml文件 我的python代码: from lxml import etree def lxml(): tree = etree.parse('feed.xml') NSMAP = {"nn":"http://www.w3.org/2005/Atom"} test = tree.xpath('//nn:category[@term="html"]/..',namespaces=NSMAP) for elem in tree.iter():
from lxml import etree
def lxml():
tree = etree.parse('feed.xml')
NSMAP = {"nn":"http://www.w3.org/2005/Atom"}
test = tree.xpath('//nn:category[@term="html"]/..',namespaces=NSMAP)
for elem in tree.iter():
print(elem.tag,'\t',elem.attrib)
print('-------------------------------')
test1 = tree.xpath('//nn:category',namespaces=NSMAP)
print('++++++++++++++++++++++++++++++++')
for node in test1:
test2 = node.xpath('./../nn:summary',namespaces=NSMAP) # return a list
print(test2.xpath('normalize-space(.)'))
print('*****************************************')
test3 = tree.xpath('//text()[normalize-space(.)]')# [normalize-space()] only remove the heading and tailing
print(test3)
输出为:
++++++++++++++++++++++++++++++++
['Putting an entire chapter on one page sounds\n bloated, but consider this — my longest chapter so far\n would be 75 printed pages, and it loads in under 5 seconds…\n On dialup.']
['Putting an entire chapter on one page sounds\n bloated, but consider this — my longest chapter so far\n would be 75 printed pages, and it loads in under 5 seconds…\n On dialup.']
['Putting an entire chapter on one page sounds\n bloated, but consider this — my longest chapter so far\n would be 75 printed pages, and it loads in under 5 seconds…\n On dialup.']
['The accessibility orthodoxy does not permit people to\n question the value of features that are rarely useful and rarely used.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
*****************************************
['\n ', 'dive into mark', '\n ', 'currently between addictions', '\n ', 'tag:diveintomark.org,2001-07-29:/', '\n ', '2009-03-27T21:56:07Z', '\n ', '\n ', '\n ', '\n ', '\n ', 'Mark', '\n ', 'http://diveintomark.org/', '\n ', '\n ', 'Dive into history, 2009 edition', '\n ', '\n ', 'tag:diveintomark.org,2009-03-27:/archives/20090327172042', '\n ', '2009-03-27T21:56:07Z', '\n ', '2009-03-27T17:20:42Z', '\n ', '\n ', '\n ', '\n ', 'Putting an entire chapter on one page sounds\n bloated, but consider this — my longest chapter so far\n would be 75 printed pages, and it loads in under 5 seconds…\n On dialup.', '\n ', '\n ', '\n ', '\n ', 'Mark', '\n ', 'http://diveintomark.org/', '\n ', '\n ', 'Accessibility is a harsh mistress', '\n ', '\n ', 'tag:diveintomark.org,2009-03-21:/archives/20090321200928', '\n ', '2009-03-22T01:05:37Z', '\n ', '2009-03-21T20:09:28Z', '\n ', '\n ', 'The accessibility orthodoxy does not permit people to\n question the value of features that are rarely useful and rarely used.', '\n ', '\n ', '\n ', '\n ', 'Mark', '\n ', '\n ', 'A gentle introduction to video encoding, part 1: container formats', '\n ', '\n ', 'tag:diveintomark.org,2008-12-18:/archives/20081218155422', '\n ', '2009-01-11T19:39:22Z', '\n ', '2008-12-18T15:54:22Z', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', 'These notes will eventually become part of a\n tech talk on video encoding.', '\n ', '\n']..
我的问题是为什么有这么多“\n”。如何删除它们
另外一个问题是如何直接查询文本的标记,例如make以获取“Mark”(条目文本的子项)的节点
非常感谢
“我的问题是为什么有这么多'\n'。如何删除它们?”
XML中的每个空格都将由XPath选择。格式良好的XML通常包含大量的换行符和空格。例如,在下面的XML中,有两个空文本节点将由//text()选择
即一个介于
和
之间,另一个介于
和
之间:
<root>
<foo>bar</foo>
</root>
上面应该获取变量
您的\u text\u节点所引用的文本节点的父元素,然后返回元素的标记名。\n
是一个转义序列
您可以检查页面源代码,发现膨胀的位于新行的开头
若要删除它们,您可以使用或。请不要将代码作为图像发布。将其作为文本发布,并正确设置格式(突出显示/选择文本->单击{}
)。谢谢我修复了它。很抱歉,因为我是初学者,所以样式不好。谢谢
your_text_node.getparent().tag