Python 缺少标记时解析xml文件
我尝试解析一个xml文件。标记中的文本被成功解析(或者看起来是这样),但我想输出为一些标记中不包含的文本,而下面的程序只是忽略它Python 缺少标记时解析xml文件,python,html,xml,parsing,Python,Html,Xml,Parsing,我尝试解析一个xml文件。标记中的文本被成功解析(或者看起来是这样),但我想输出为一些标记中不包含的文本,而下面的程序只是忽略它 from xml.etree.ElementTree import XMLTreeBuilder class HtmlLatex: # The target object of the parser out = '' var = '' def start(self, tag, attrib): #
from xml.etree.ElementTree import XMLTreeBuilder
class HtmlLatex: # The target object of the parser
out = ''
var = ''
def start(self, tag, attrib): # Called for each opening tag.
pass
def end(self, tag): # Called for each closing tag.
if tag == 'i':
self.out += self.var
elif tag == 'sub':
self.out += '_{' + self.var + '}'
elif tag == 'sup':
self.out += '^{' + self.var + '}'
else:
self.out += self.var
def data(self, data):
self.var = data
def close(self):
print(self.out)
if __name__ == '__main__':
target = HtmlLatex()
parser = XMLTreeBuilder(target=target)
text = ''
with open('input.txt') as f1:
text = f1.read()
print(text)
parser.feed(text)
parser.close()
我要分析的输入的一部分:
p0=(m3+(2l2+l1)m2+(l22+2l1 l2+l12)m)/(m3+(3l2+2l1))。看一看,一个用于解析、导航和操作html和xml的python库。它有一个方便的界面,可能会解决您的问题……看看,一个用于解析、导航和操作html和xml的python库。它有一个方便的界面,可能会解决您的问题…这是一个pyparsing版本-我希望注释能够充分解释
src = """<p><i>p</i><sub>0</sub> = (<i>m</i><sup>3</sup>+(2<i>l</i><sub>2</sub>+<i>l</i><sub>1</sub>) """ \
"""<i>m</i><sup>2</sup>+(<i>l</i><sub>2</sub><sup>2</sup>+2<i>l</i><sub>1</sub> <i>l</i><sub>2</sub>+""" \
"""<i>l</i><sub>1</sub><sup>2</sup>) <i>m</i>) /(<i>m</i><sup>3</sup>+(3<i>l</i><sub>2</sub>+""" \
"""2<i>l</i><sub>1</sub>) ) }.</p>"""
from pyparsing import makeHTMLTags, anyOpenTag, anyCloseTag, Suppress, replaceWith
# set up tag matching for <sub> and <sup> tags
SUB,endSUB = makeHTMLTags("sub")
SUP,endSUP = makeHTMLTags("sup")
# all other tags will be suppressed from the output
ANY,endANY = map(Suppress,(anyOpenTag,anyCloseTag))
SUB.setParseAction(replaceWith("_{"))
SUP.setParseAction(replaceWith("^{"))
endSUB.setParseAction(replaceWith("}"))
endSUP.setParseAction(replaceWith("}"))
transformer = (SUB | endSUB | SUP | endSUP | ANY | endANY)
# now use the transformer to apply these transforms to the input string
print transformer.transformString(src)
这是一个pyparsing版本——我希望这些注释能够充分解释
src = """<p><i>p</i><sub>0</sub> = (<i>m</i><sup>3</sup>+(2<i>l</i><sub>2</sub>+<i>l</i><sub>1</sub>) """ \
"""<i>m</i><sup>2</sup>+(<i>l</i><sub>2</sub><sup>2</sup>+2<i>l</i><sub>1</sub> <i>l</i><sub>2</sub>+""" \
"""<i>l</i><sub>1</sub><sup>2</sup>) <i>m</i>) /(<i>m</i><sup>3</sup>+(3<i>l</i><sub>2</sub>+""" \
"""2<i>l</i><sub>1</sub>) ) }.</p>"""
from pyparsing import makeHTMLTags, anyOpenTag, anyCloseTag, Suppress, replaceWith
# set up tag matching for <sub> and <sup> tags
SUB,endSUB = makeHTMLTags("sub")
SUP,endSUP = makeHTMLTags("sup")
# all other tags will be suppressed from the output
ANY,endANY = map(Suppress,(anyOpenTag,anyCloseTag))
SUB.setParseAction(replaceWith("_{"))
SUP.setParseAction(replaceWith("^{"))
endSUB.setParseAction(replaceWith("}"))
endSUP.setParseAction(replaceWith("}"))
transformer = (SUB | endSUB | SUP | endSUP | ANY | endANY)
# now use the transformer to apply these transforms to the input string
print transformer.transformString(src)
这是我从未见过的xml。确定你不想要html解析器吗?它是从这里产生的:当你得到解决方案时,如果你看一下源代码,你会看到类似的东西。只需编辑掉LaTeX标签???这是我从未见过的xml。确定你不想要html解析器吗?它是从这里产生的:当你得到解决方案时,如果你看一下源代码,你会看到类似的东西。只需编辑掉LaTeX标签???谢谢你的建议。我会看看的。谢谢你的建议。我来看看。