Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用python解析不同的xml文件_Python_Xml_Beautifulsoup_Lxml_Elementtree - Fatal编程技术网

用python解析不同的xml文件

用python解析不同的xml文件,python,xml,beautifulsoup,lxml,elementtree,Python,Xml,Beautifulsoup,Lxml,Elementtree,我有2个xml文件, 单词和主题 我需要根据主题文件解析word文件。 文件如下 文件1主题 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <nite:root nite:id="ES2002a.topic" xmlns:nite="http://nite.sourceforge.net/"> <topic nite:id="ES2002a.topic.vkaraisk.1" other_

我有2个xml文件, 单词和主题

我需要根据主题文件解析word文件。 文件如下

文件1主题

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<nite:root nite:id="ES2002a.topic" 
xmlns:nite="http://nite.sourceforge.net/">
<topic nite:id="ES2002a.topic.vkaraisk.1" other_description="introduction of participants and their roles">
      <nite:pointer role="scenario_topic_type"  href="default-topics.xml#id(top.4)"/>
      <nite:child href="ES2002a.B.words.xml#id(ES2002a.B.words0)..id(ES2002a.B.words5)"/>
      <nite:child href="ES2002a.D.words.xml#id(ES2002a.D.words0)..id(ES2002a.D.words3)"/>
      <nite:child 
我正在考虑获取所使用的word文件的列表,然后根据开始和停止条件对word文件进行迭代

我曾经用来解析xml。它只是将XML字符串转换为python
字典

topic.xml

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<nite:root nite:id="ES2002a.topic" 
xmlns:nite="http://nite.sourceforge.net/">
<topic nite:id="ES2002a.topic.vkaraisk.1" other_description="introduction of participants and their roles">
      <nite:pointer role="scenario_topic_type"  href="default-topics.xml#id(top.4)"/>
      <nite:child href="ES2002a.B.words.xml#id(ES2002a.B.words0)..id(ES2002a.B.words5)"/>
      <nite:child href="ES2002a.D.words.xml#id(ES2002a.D.words0)..id(ES2002a.D.words3)"/>
</topic>
</nite:root>




import jxmlease

with open('topic.xml') as topic:
    topic_content = topic.read()

root = jxmlease.parse(topic_content)
first_word_selection = root['nite:root']['topic']['nite:child'][0].get_xml_attr("href")

print(first_word_selection)
output : ES2002a.D.words.xml#id(ES2002a.D.words0)..id(ES2002a.D.words3)

导入jxmlease
以open('topic.xml')作为主题:
topic\u content=topic.read()
root=jxmlese.parse(主题内容)
第一个单词\u selection=root['nite:root']['topic']['nite:child'][0]。获取xml属性(“href”)
打印(第一个单词选择)
输出:ES2002a.D.words.xml#id(ES2002a.D.words0)…id(ES2002a.D.words3)

这个问题还不清楚。您已经用“beautifulsoup”、“lxml”和“elementree”标记了这个问题,但您没有向我们显示任何代码。你试过什么?@mzjn我将把我试过的添加到主postHi Sijan。我刚刚试着运行那段代码,我得到了以下错误类型错误:列表索引必须是整数或片,而不是strHi@Pythonuser,我已经在上面添加了我的
topic.xml
内容。你能跑一次吗?我的工作很好。您好@Sijan Bhandari,我使用内容运行了xml文件,并得到了以下错误:KeyError:“nite:root”您已经安装了
jxmlease
对吗?您可以尝试打印
root
变量,并查看它是否解析良好。
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<nite:root nite:id="ES2002a.D.words" xmlns:nite="http://nite.sourceforge.net/">
   <w nite:id="ES2002a.D.words0" starttime="67.21" endtime="67.45">Mm-hmm</w>
   <w nite:id="ES2002a.D.words1" starttime="67.45" endtime="67.45" punc="true">.</w>
   <w nite:id="ES2002a.D.words2" starttime="74.89" endtime="75.24">Great</w>
   <w nite:id="ES2002a.D.words3" starttime="75.24" endtime="75.24" punc="true">.</w>
   <w nite:id="ES2002a.D.words4" starttime="82.08" endtime="82.25">And</w>
   <w nite:id="ES2002a.D.words5" starttime="82.25" endtime="82.43">I&#39;m</w>
  <nite:child href="ES2002a.B.words.xml#id(ES2002a.B.words0)..id(ES2002a.B.words5)"/>
from lxml import etree
tree = etree.parse("./ES2013a.topic.xml") 
root = tree.getroot() 
childA = []
elementT = []
ElementA = []
for child in root:
    elementT.append(str(child.tag))
    ElementA.append(str(child.attrib))
    childA.append(str(child.attrib))
    for element in child:
        elementT.append(str(element.tag))
        #childA.append(child.attrib)
        ElementA.append(str(element.attrib))
        childA.append(str(child.attrib))
        for sub in element:
            #print('***', child.attrib , ':' , element.tag, ':' , element.attrib, '***')
            #childA.append(child.attrib)
            elementT.append(str(sub.tag))
            ElementA.append(str(sub.attrib))
            childA.append(str(child.attrib))

df = pd.DataFrame()
df['c'] = np.array (childA)
df['t'] = np.array(ElementA)
df['a'] = np.array(elementT)

file = df['t'].str.extract(r'([A-Z][A-Z].*[words.xml])#')
start = df['t'].str.extract(r'words([0-9]+)')
stop = df['t'].str.extract(r'.*words([0-9]+)')
tags = df['a'].str.extract(r'.*([topic]|[pointer]|[child])')
rootTopic = df['c'].str.extract(r'ES2013a.topic.rdhillon.(\d+)')
df['f'] = file
df['start'] = start
df['stop'] = stop
df['tags'] = tags
# c= topic
# r = pointerr
# d= child
df['topicID'] = rootTopic

df = df.iloc[:,3:]
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<nite:root nite:id="ES2002a.topic" 
xmlns:nite="http://nite.sourceforge.net/">
<topic nite:id="ES2002a.topic.vkaraisk.1" other_description="introduction of participants and their roles">
      <nite:pointer role="scenario_topic_type"  href="default-topics.xml#id(top.4)"/>
      <nite:child href="ES2002a.B.words.xml#id(ES2002a.B.words0)..id(ES2002a.B.words5)"/>
      <nite:child href="ES2002a.D.words.xml#id(ES2002a.D.words0)..id(ES2002a.D.words3)"/>
</topic>
</nite:root>




import jxmlease

with open('topic.xml') as topic:
    topic_content = topic.read()

root = jxmlease.parse(topic_content)
first_word_selection = root['nite:root']['topic']['nite:child'][0].get_xml_attr("href")

print(first_word_selection)
output : ES2002a.D.words.xml#id(ES2002a.D.words0)..id(ES2002a.D.words3)