Python 如何递归iterparse一个LXML树，避免一个节点进入两次？_Python_Lxml

Python 如何递归iterparse一个LXML树，避免一个节点进入两次？

python

Python 如何递归iterparse一个LXML树，避免一个节点进入两次？,python,lxml,Python,Lxml,递归函数是parseMML。我希望它能将MathML表达式解析为Python表达式。简单的示例mmlinput是por产生分数3/5，但它产生： ['(', '(', '3', ')', '/', '(', '5', ')', '(', '3', ')', '(', '5', ')', ')'] 而不是： ['(', '(', '3', ')', '/', '(', '5', ')', ')'] 因为我不知道如何去除已经递归输入的元素。有没有关于如何跳过它们的想法谢谢 mmlinput='

递归函数是parseMML。我希望它能将MathML表达式解析为Python表达式。简单的示例mmlinput是por产生分数3/5，但它产生：

['(', '(', '3', ')', '/', '(', '5', ')', '(', '3', ')', '(', '5', ')', ')']

而不是：

['(', '(', '3', ')', '/', '(', '5', ')', ')']

因为我不知道如何去除已经递归输入的元素。有没有关于如何跳过它们的想法

谢谢

mmlinput='''<?xml version="1.0"?> <math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/1998/Math/MathML http://www.w3.org/Math/XMLSchema/mathml2/mathml2.xsd"> <mrow> <mfrac> <mrow> <mn>3</mn> </mrow> <mrow> <mn>5</mn> </mrow> </mfrac> </mrow> </math>'''


def parseMML(mmlinput):
    from lxml import etree
    from StringIO import *
    from lxml import objectify
    exppy=[]
    events = ("start", "end")
    context = etree.iterparse(StringIO(mmlinput),events=events)
    for action, elem in context:
        if (action=='start') and (elem.tag=='mrow'):
            exppy+='('
        if (action=='end') and (elem.tag=='mrow'):
            exppy+=')'
        if (action=='start') and (elem.tag=='mfrac'):
            mmlaux=etree.tostring(elem[0])
            exppy+=parseMML(mmlaux)
            exppy+='/'
            mmlaux=etree.tostring(elem[1])
            exppy+=parseMML(mmlaux)
        if action=='start' and elem.tag=='mn': #this is a number
            exppy+=elem.text
    return (exppy)

mmlinput=''35''
def parseMML（mmlinput）：
从lxml导入etree
从StringIO导入*
从lxml导入objectify
exppy=[]
事件=（“开始”、“结束”）
context=etree.iterparse（StringIO（mmlinput），events=events）
对于行动，上下文中的元素：
如果（action=='start'）和（elem.tag=='mrow'）：
exppy+='（'
如果（action=='end'）和（elem.tag=='mrow'）：
exppy+='）'
如果（action=='start'）和（elem.tag=='mfrac'）：
mmlaux=etree.tostring（元素[0]）
exppy+=parseMML（mmlaux）
exppy+='/'
mmlaux=etree.tostring（元素[1]）
exppy+=parseMML（mmlaux）
如果action='start'和elem.tag='mn'：#这是一个数字
exppy+=elem.text
返回（exppy）

问题是您在分析

mfrac

标记中的子树两次，因为你在递归地解析它。一个快速解决办法是数一数递归级别：

mmlinput = "<math> <mrow> <mfrac> <mrow> <mn>3</mn> </mrow> <mrow> <mn>5</mn> </mrow> </mfrac> </mrow> </math>"

def parseMML(mmlinput):
    from lxml import etree
    from StringIO import *
    from lxml import objectify
    exppy=[]
    events = ("start", "end")
    level = 0
    context = etree.iterparse(StringIO(mmlinput),events=events)
    for action, elem in context:
        if (action=='start') and (elem.tag=='mfrac'):
            level += 1
            mmlaux=etree.tostring(elem[0])
            exppy+=parseMML(mmlaux)
            exppy+='/'
            mmlaux=etree.tostring(elem[1])
            exppy+=parseMML(mmlaux)
        if (action=='end') and (elem.tag=='mfrac'):
            level -= 1
        if level:
            continue
        if (action=='start') and (elem.tag=='mrow'):
            exppy+='('
        if (action=='end') and (elem.tag=='mrow'):
            exppy+=')'
        if action=='start' and elem.tag=='mn': #this is a number
            exppy+=elem.text
    return (exppy)

伟大而简单。关于您的说明，我已将exppy变量初始化更改为：exppy=“”。关于名称空间删除，您是对的，我是在主程序中这样做的。谢谢

>>> lst = []
>>> lst += 'spam'
>>> lst
['s', 'p', 'a', 'm']