Python 此特定xml的xml解析
如果使用BeautifulSoup是一个选项,那么它将非常简单:Python 此特定xml的xml解析,python,xml,Python,Xml,如果使用BeautifulSoup是一个选项,那么它将非常简单: import xml.etree.ElementTree as et tree = et.parse(os.getcwd()+"/../data/train.xml") instance = tree.getroot() for stuff in instance: if(stuff.tag == "answer"): print "the correct answer is %s
import xml.etree.ElementTree as et
tree = et.parse(os.getcwd()+"/../data/train.xml")
instance = tree.getroot()
for stuff in instance:
if(stuff.tag == "answer"):
print "the correct answer is %s" % stuff.get('senseid')
if(stuff.tag == "context"):
print dir(stuff)
print stuff.text
如果您更喜欢使用ElementTree,则应使用itertext
处理所有文本:
import bs4
xtxt = ''' <instance id="activate.v.bnc.00024693" docsrc="BNC">
<answer instance="activate.v.bnc.00024693" senseid="38201"/>
<context>
Do you know what it is , and where I can get one ? We suspect you had seen the Terrex Autospade , which is made by Wolf Tools . It is quite a hefty spade , with bicycle - type handlebars and a sprung lever at the rear , which you step on to <head>activate</head> it . Used correctly , you should n't have to bend your back during general digging , although it wo n't lift out the soil and put in a barrow if you need to move it ! If gardening tends to give you backache , remember to take plenty of rest periods during the day , and never try to lift more than you can easily cope with .
</context>
</instance>'''
soup = bs4.BeautifulSoup(xtxt)
print soup.find('context').text
如果您确信您的xml文件是正确的,ElementTree就足够了,因为它是标准Python库的一部分,您将没有外部依赖性。但是,如果XML可能格式不正确,那么BeautifulSoup在修复小错误方面非常出色。如果使用BeautifulSoup是一个选项,那么它将非常简单:
import xml.etree.ElementTree as et
tree = et.parse(os.getcwd()+"/../data/train.xml")
instance = tree.getroot()
for stuff in instance:
if(stuff.tag == "answer"):
print "the correct answer is %s" % stuff.get('senseid')
if(stuff.tag == "context"):
print dir(stuff)
print stuff.text
如果您更喜欢使用ElementTree,则应使用itertext
处理所有文本:
import bs4
xtxt = ''' <instance id="activate.v.bnc.00024693" docsrc="BNC">
<answer instance="activate.v.bnc.00024693" senseid="38201"/>
<context>
Do you know what it is , and where I can get one ? We suspect you had seen the Terrex Autospade , which is made by Wolf Tools . It is quite a hefty spade , with bicycle - type handlebars and a sprung lever at the rear , which you step on to <head>activate</head> it . Used correctly , you should n't have to bend your back during general digging , although it wo n't lift out the soil and put in a barrow if you need to move it ! If gardening tends to give you backache , remember to take plenty of rest periods during the day , and never try to lift more than you can easily cope with .
</context>
</instance>'''
soup = bs4.BeautifulSoup(xtxt)
print soup.find('context').text
如果您确信您的xml文件是正确的,ElementTree就足够了,因为它是标准Python库的一部分,您将没有外部依赖性。但是,如果XML可能格式不正确,BeautifulSoup很擅长修复小错误。可以使用元素序列化。有两种选择:
- 保持内部
- 只返回没有任何标记的文本
标签:
import xml.etree.ElementTree as et
tree = et.parse(os.getcwd()+"/../data/train.xml")
instance = tree.getroot()
for stuff in instance:
if(stuff.tag == "answer"):
print "the correct answer is %s" % stuff.get('senseid')
if(stuff.tag == "context"):
print dir(stuff)
print ''.join(stuff.itertext())
#将元素转换为字符串并删除标记
打印(et.tostring(stuff).strip().lstrip(“”).rstrip(“”)))
#不带任何标记的只读文本
打印(et.tostring(stuff,method='text'))
可以使用元素序列化。有两种选择:
- 保持内部
- 只返回没有任何标记的文本
标签:
import xml.etree.ElementTree as et
tree = et.parse(os.getcwd()+"/../data/train.xml")
instance = tree.getroot()
for stuff in instance:
if(stuff.tag == "answer"):
print "the correct answer is %s" % stuff.get('senseid')
if(stuff.tag == "context"):
print dir(stuff)
print ''.join(stuff.itertext())
#将元素转换为字符串并删除标记
打印(et.tostring(stuff).strip().lstrip(“”).rstrip(“”)))
#不带任何标记的只读文本
打印(et.tostring(stuff,method='text'))