Python XML解析迭代&;手柄断裂

Python XML解析迭代&;手柄断裂,python,python-3.x,xml-parsing,iterator,Python,Python 3.x,Xml Parsing,Iterator,我正在尝试解析XML的这一部分,并希望通过确定需要运行多少次,它能够以迭代模式自己运行 此外,行项目可能有也可能没有每列的所有值,如果其中任何一列不存在特定的标记/文本,我将尝试用“无”填充这些空白,以便稍后在csv转换中将其映射到右列 要分析的我的XML(以粗体突出显示,发票行项目): 有点晚了,但让我们试试这个: items = """[your xml above]""" import lxml.html import pandas as pd categories = ["invoi

我正在尝试解析XML的这一部分,并希望通过确定需要运行多少次,它能够以迭代模式自己运行

此外,行项目可能有也可能没有每列的所有值,如果其中任何一列不存在特定的标记/文本,我将尝试用“无”填充这些空白,以便稍后在csv转换中将其映射到右列

要分析的我的XML(以粗体突出显示,发票行项目):


有点晚了,但让我们试试这个:

items = """[your xml above]"""

import lxml.html
import pandas as pd

categories = ["invoicelinenum", "polinenum","quantity","uom","unitprice","lineamount","salestaxpercent","supplierpartnum","shortdescription",
"longdescription","deliverychargecode]"]

columns = ['ILI Line Num','ILI PO Line',
          'ILI QTY', 'ILI UOM','ILI Unit Price','ILI Line Amt','ILI Sales Tax %',
           'ILI Supply','ShortDesc','LongDesc','ChargeCode']

doc = lxml.html.fromstring(items)
invoices = doc.xpath('//InvoiceLineItems/LineItem'.lower())

def dict_to_list(d, keys):
    return [d.get(key, None) for key in keys]
#credit: https://stackoverflow.com/a/58192327/9448090

all_inv = []
fin_dicts=[]
fin_list = []

for invoice in invoices:    
    items = []
    for item in invoice:
        item_dict = {}
        item_dict[item.tag]= item.text
        items.append(item_dict)
    all_inv.append(items)

for inv in all_inv:
    temp_dict={}
    for d in inv:
        temp_dict.update(d)
    fin_dicts.append(temp_dict)

for dict in fin_dicts:
    fin_list.append(dict_to_list(dict, categories))

df = pd.DataFrame(fin_list,columns=columns)
df

这将为您提供您要查找的表。

您使用的是哪个库?xml.etree.ElementTree作为ET
items = """[your xml above]"""

import lxml.html
import pandas as pd

categories = ["invoicelinenum", "polinenum","quantity","uom","unitprice","lineamount","salestaxpercent","supplierpartnum","shortdescription",
"longdescription","deliverychargecode]"]

columns = ['ILI Line Num','ILI PO Line',
          'ILI QTY', 'ILI UOM','ILI Unit Price','ILI Line Amt','ILI Sales Tax %',
           'ILI Supply','ShortDesc','LongDesc','ChargeCode']

doc = lxml.html.fromstring(items)
invoices = doc.xpath('//InvoiceLineItems/LineItem'.lower())

def dict_to_list(d, keys):
    return [d.get(key, None) for key in keys]
#credit: https://stackoverflow.com/a/58192327/9448090

all_inv = []
fin_dicts=[]
fin_list = []

for invoice in invoices:    
    items = []
    for item in invoice:
        item_dict = {}
        item_dict[item.tag]= item.text
        items.append(item_dict)
    all_inv.append(items)

for inv in all_inv:
    temp_dict={}
    for d in inv:
        temp_dict.update(d)
    fin_dicts.append(temp_dict)

for dict in fin_dicts:
    fin_list.append(dict_to_list(dict, categories))

df = pd.DataFrame(fin_list,columns=columns)
df