使用异常处理的Minidom Python XML解析
我正在剥离数百万个XML的敏感数据。我怎样才能添加一个try-and-except来绕过这个错误,这个错误似乎是由于一些格式错误的xml被抛出到一堆文件中而发生的 xml.parsers.expat.expat错误:不匹配的标记:第1行第28691列使用异常处理的Minidom Python XML解析,python,xml,Python,Xml,我正在剥离数百万个XML的敏感数据。我怎样才能添加一个try-and-except来绕过这个错误,这个错误似乎是由于一些格式错误的xml被抛出到一堆文件中而发生的 xml.parsers.expat.expat错误:不匹配的标记:第1行第28691列 #!/usr/bin/python import sys from xml.dom import minidom def getCleanString(word): str = "" dummy = 0
#!/usr/bin/python
import sys
from xml.dom import minidom
def getCleanString(word):
str = ""
dummy = 0
for character in word:
try:
character = character.encode('utf-8')
str = str + character
except:
dummy += 1
return str
def parsedelete(content):
dom = minidom.parseString(content)
for element in dom.getElementsByTagName('RI_RI51_ChPtIncAcctNumber'):
parentNode = element.parentNode
parentNode.removeChild(element)
return dom.toxml()
for line in sys.stdin:
if line > 1:
line = line.strip()
line = line.split(',', 2)
if len(line) > 2:
partition = line[0]
id = line[1]
xml = line[2]
xml = getCleanString(xml)
xml = parsedelete(xml)
strng = '%s\t%s\t%s' %(partition, id, xml)
sys.stdout.write(strng + '\n')
捕获异常非常简单。将
import xml
添加到您的import语句中,并将问题代码包装在try/except处理程序中
def parsedelete(content):
try:
dom = minidom.parseString(content)
except xml.parsers.expat.ExpatError, e:
# not sure how you want to handle the error... so just passing back as string
return str(e)
for element in dom.getElementsByTagName('RI_RI51_ChPtIncAcctNumber'):
parentNode = element.parentNode
parentNode.removeChild(element)
return dom.toxml()
您希望如何“绕过”错误?它发生在代码的何处?