使用ElementTree进行Python XML解析时返回None
我试图用Python中的ElementTree解析这个xml字符串 以字符串形式存储的数据使用ElementTree进行Python XML解析时返回None,python,xml,elementtree,Python,Xml,Elementtree,我试图用Python中的ElementTree解析这个xml字符串 以字符串形式存储的数据 xml = '''<?xml version="1.0" encoding="utf-8"?> <SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Student> <RollN
xml = '''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
<RollNumber>1</RollNumber>
<Name>Abel</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>abel@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>2</RollNumber>
<Name>Joseph</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>joseph@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>3</RollNumber>
<Name>Mike</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>mike@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
</SearchResults>'''
打印结果
将结果打印为元素
[<Element 'Student' at 0x7feb615b4ad0>, <Element 'Student' at 0x7feb615b4c50>, <Element 'Student' at 0x7feb615b4e10>]
无论如何,当我尝试使用打印student.get('Name')获取学生的名字时,程序返回None
我试图做的是从xml中提取每个标记的值并构造一个dict。这里有一个双循环:
for students in results:
for student in students:
print student.get('Name')
学生
是一个
元素。通过迭代,您可以得到该元素中包含的各个元素。那些包含的元素(
,
等)没有名称
属性
.get()
方法仅访问属性,但您似乎想要获取
元素。在此处使用.find()
或XPath表达式:
for student in results:
name = student.find('Name')
if name is not None:
print name.text
或
如果您不熟悉XML处理,请执行以下操作:
- 是用于在python中与XML交互的快速而强大的库。标准库没有完整的
支持xpath
- 是一种用于检查XML文档的查询语言,它具有陡峭的学习曲线,但很容易获得有关StackOverflow的帮助
非常有用,我开始在使用API时将JSON转换为XML,这样我就可以编写xpath
查询,而不是疯狂的嵌套字典解引用xpath
微妙之处:
查询通常返回一个结果集,因为大多数查询可能有多个匹配项。因此,首先使用helperxpath
函数first
for students in results:
for student in students:
print student.get('Name')
for student in results:
name = student.find('Name')
if name is not None:
print name.text
for student_name in xml.findall('.//Student/Name'):
print name.text
from lxml import etree
from pprint import pprint
doc = etree.XML('''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
<RollNumber>1</RollNumber>
<Name>Abel</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>abel@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>2</RollNumber>
<Name>Joseph</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>joseph@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>3</RollNumber>
<Name>Mike</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>mike@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
</SearchResults>''')
def first(seq,default=None):
for item in seq:
return item
return default
def simple_children_to_dict(element):
result = {}
for child in element:
result[child.tag] = child.text
return result
def get_by_rollnumber(number,search_results):
student_element = first(search_results.xpath('Student[./RollNumber=$number]',number=number))
if student_element is None:
raise Exception("Student Number {0} not found".format(number))
return simple_children_to_dict(student_element)
def get_all_students(search_results):
students = []
for student_element in doc.xpath('Student'):
students.append(simple_children_to_dict(student_element))
return students
>>> pprint(get_by_rollnumber(2,doc))
{'Email': 'joseph@hisschool.edu',
'Grade': '7',
'Name': 'Joseph',
'PhoneNumber': 'Not Included',
'RollNumber': '2'}
>>>
>>> pprint(get_all_students(doc))
[{'Email': 'abel@hisschool.edu',
'Grade': '7',
'Name': 'Abel',
'PhoneNumber': 'Not Included',
'RollNumber': '1'},
{'Email': 'joseph@hisschool.edu',
'Grade': '7',
'Name': 'Joseph',
'PhoneNumber': 'Not Included',
'RollNumber': '2'},
{'Email': 'mike@hisschool.edu',
'Grade': '7',
'Name': 'Mike',
'PhoneNumber': 'Not Included',
'RollNumber': '3'}]