用于xml列表的Python函数
我已经像这样解析了XML文件。也许我只是抄得不好,但没关系,所以,这是:用于xml列表的Python函数,python,xml,parsing,Python,Xml,Parsing,我已经像这样解析了XML文件。也许我只是抄得不好,但没关系,所以,这是: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE raml SYSTEM 'raml20.dtd'> <raml version="2.0" xmlns="raml20.xsd"> <cmData type="actual"> <manage
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
<cmData type="actual">
<managedObject class="LN" distName="PTR" id="2425">
<p name="aak">220</p>
<p name="orp">05</p>
<p name="name">Portro</p>
<p name="optres">false</p>
<p name="optblu">false</p>
<p name="aoptdet">false</p>
<p name="advcell">false</p>
<list name="sibList">
<item>
<p name="sibcity">177</p>
<p name="sibrep">2</p>
</item>
<item>
<p name="sibcity">177</p>
<p name="sibrep">1</p>
</item>
</list>
</managedObject>
<managedObject class="LN" distName="KRNS" id="93886">
<p name="aak">150</p>
<p name="orp">05</p>
<p name="name">Portro</p>
<p name="optres">false</p>
<p name="optblu">tru</p>
<p name="aoptdet">false</p>
<p name="advcell">true</p>
<list name="sibList">
<item>
<p name="sibcity">177</p>
<p name="sibrep">1</p>
</item>
<item>
<p name="sibcity">180</p>
<p name="sibrep">2</p>
</item>
</list>
</managedObject>
....
<managedObject>
...
</managedObject>
...
</cmData>
</raml>
下面是一个解析XML文件的解决方案,它将每个managedObject与所有其他对象进行比较,并打印出结果diff对象
import json
from xml.etree import ElementTree
tree = ElementTree.parse('raml20.xml')
ns = {'ns': 'raml20.xsd'}
nsP, nsList, nsItem = ('{%s}%s' % (ns['ns'], i) for i in ('p', 'list', 'item'))
def pkv(o):
"""Return dict with name:text of p elements"""
return {k.attrib['name']: k.text for k in o.iter(nsP)}
def parse(tree):
root = tree.getroot()
objs = {}
for mo in root.findall('./ns:cmData/ns:managedObject', ns):
obj = pkv(mo)
for i in mo.iter(nsList):
obj[i.attrib['name']] = [pkv(j) for j in i.iter(nsItem)]
objs[mo.attrib['distName']] = obj
return objs
def diff_dicts(d1, d2, ignore_keys=set()):
"""Return dict with differences between the dicts provided as arguments"""
k1 = set(d1.keys())
k2 = set(d2.keys())
diff = {}
diff.update(
{i: (d1[i], d2[i]) for i in (k1 & k2) - ignore_keys if d1[i] != d2[i]})
diff.update({i: (d1.get(i), d2.get(i)) for i in (k1 ^ k2) - ignore_keys})
return diff
def diff_lists(l1, l2):
"""Return dict with differences between lists of dicts provided as arguments"""
diff = {}
# note: assumes that lists are of same length
for i, (d1, d2) in enumerate(zip(l1, l2)):
d = diff_dicts(d1, d2)
if d:
diff[i] = d
return diff
def diff_objects(o1, o2):
"""Return dict with differences between two objects (dicts) provided as arguments"""
listkeys = set(
i for o in (o1, o2) for i in o if isinstance(o.get(i), list))
diff = diff_dicts(o1, o2, listkeys)
for i in listkeys:
if i in o1 and i in o2:
diff.update({i: diff_lists(o1[i], o2[i])})
else:
diff.update({i: (o1.get(i), o2.get(i))})
return diff
def compare_objects(objs):
diffs = []
keys = list(objs)
for k1, k2 in zip(keys[:-1], keys[1:]):
o1, o2 = objs[k1], objs[k2]
diff = diff_objects(o1, o2)
if diff:
diffs.append((k1, k2, diff))
return diffs
res = compare_objects(parse(tree))
print(json.dumps(res, indent=2))
我已经使用以下raml20.xml
文件进行了测试:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
<cmData type="actual">
<managedObject class="LN" distName="PTR" id="2425">
<p name="aak">220</p>
<p name="orp">05</p>
<p name="name">Portro</p>
<p name="optres">false</p>
<p name="optblu">false</p>
<p name="aoptdet">false</p>
<p name="advcell">false</p>
<list name="sibList">
<item>
<p name="sibcity">177</p>
<p name="sibrep">2</p>
</item>
<item>
<p name="sibcity">177</p>
<p name="sibrep">1</p>
</item>
</list>
</managedObject>
<managedObject class="LN" distName="KRNS" id="93886">
<p name="aak">150</p>
<p name="orp">05</p>
<p name="name">Portro</p>
<p name="optres">false</p>
<p name="optblu">tru</p>
<p name="aoptdet">false</p>
<p name="advcell">true</p>
<list name="sibList">
<item>
<p name="sibcity">177</p>
<p name="sibrep">1</p>
</item>
<item>
<p name="sibcity">180</p>
<p name="sibrep">2</p>
</item>
</list>
</managedObject>
</cmData>
</raml>
你的问题很难理解。嗯,基本上您希望遍历XML并处理所有
managedObject
节点?你试过lxml或BeautifulSoup吗?是的,我试过了。我已经更新了我的代码。但现在我无法向特定managedObject声明sibList。最后,我需要excel文件,其中managedObjects作为列,参数作为行。值将是文本,例如:220、05、Portro等。我需要提到的是,我使用了etree解析器@techouse该代码块的缩进非常不稳定,请将代码粘贴到其中,突出显示它,然后使用{}
按钮格式化代码块。您对要执行的操作的描述非常模糊,但我要说的是,可能没有一个现有的函数可以完全执行您想要执行的操作,你可能需要自己写。@jovicbg不客气。如果答案解决了你的问题,请将其标记为已接受的答案。这很好。现在,我需要将所有managedObject与一个引用对象进行比较。如果您有一个引用managedObject,只需修改compare\u objects
函数以与该特定对象进行比较。例如,参见本要点:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
<cmData type="actual">
<managedObject class="LN" distName="PTR" id="2425">
<p name="aak">220</p>
<p name="orp">05</p>
<p name="name">Portro</p>
<p name="optres">false</p>
<p name="optblu">false</p>
<p name="aoptdet">false</p>
<p name="advcell">false</p>
<list name="sibList">
<item>
<p name="sibcity">177</p>
<p name="sibrep">2</p>
</item>
<item>
<p name="sibcity">177</p>
<p name="sibrep">1</p>
</item>
</list>
</managedObject>
<managedObject class="LN" distName="KRNS" id="93886">
<p name="aak">150</p>
<p name="orp">05</p>
<p name="name">Portro</p>
<p name="optres">false</p>
<p name="optblu">tru</p>
<p name="aoptdet">false</p>
<p name="advcell">true</p>
<list name="sibList">
<item>
<p name="sibcity">177</p>
<p name="sibrep">1</p>
</item>
<item>
<p name="sibcity">180</p>
<p name="sibrep">2</p>
</item>
</list>
</managedObject>
</cmData>
</raml>
[
[
"PTR",
"KRNS",
{
"advcell": [
"false",
"true"
],
"optblu": [
"false",
"tru"
],
"sibcity": [
"177",
"180"
],
"aak": [
"220",
"150"
],
"sibrep": [
"1",
"2"
],
"sibList": {
"0": {
"sibrep": [
"2",
"1"
]
},
"1": {
"sibcity": [
"177",
"180"
],
"sibrep": [
"1",
"2"
]
}
}
}
]
]