pythonxml到记录_Python_Xml_Lxml - Fatal编程技术网

pythonxml到记录

python xml

pythonxml到记录,python,xml,lxml,Python,Xml,Lxml,我试图遍历一个嵌套的xml文件结构，其中我只对某些元素值/文本感兴趣。XMLITELF包含“行”元素，表示值可以出现多次。目标是将其读取/转换为数据库记录。 xml如下所示： <CommandManagerResults> <ListPropertiesAttribute> <Row> <Name>AttributeName</Name> <Id>B31BEF954E05B473A8D

我试图遍历一个嵌套的xml文件结构，其中我只对某些元素值/文本感兴趣。XMLITELF包含“行”元素，表示值可以出现多次。目标是将其读取/转换为数据库记录。 xml如下所示：

<CommandManagerResults>
<ListPropertiesAttribute>
    <Row>
        <Name>AttributeName</Name>
        <Id>B31BEF954E05B473A8D3A1B63B29F91E</Id>
        <Description>TECHCOLUMNNAME</Description>
        <LongDescription/>
        <CreationTime>26. August 2010 10:16:10 MESZ</CreationTime>
        <ModificationTime>23. November 2017 20:13:37 MEZ</ModificationTime>
        <Owner>Administrator</Owner>
        <Hidden>False</Hidden>
        <Row>
            <AttributeFormName>ID</AttributeFormName>
            <AttributeFormCategory>ID</AttributeFormCategory>
            <AttributeFormType>Number</AttributeFormType>
            <AttributeFormDescription/>
            <AttributeFormReportSort>None</AttributeFormReportSort>
            <AttributeFormBrowseSort>None</AttributeFormBrowseSort>
            <AttributeLookUpTable>FACT_TABLE_NAME</AttributeLookUpTable>
            <Row>
                <SchemaExpression>ApplySimple("nvl(#0, -2)";TECHCOLUMNNAME)</SchemaExpression>
                <MappingMethod>Manual</MappingMethod>
                <Row>
                    <SchemaCandidateTable>FACT_TABLE_NAME</SchemaCandidateTable>
                </Row>
            </Row>
            <Multilingual>Unknown</Multilingual>
        </Row>
        <Row>
            <AttributeFormName>DESC</AttributeFormName>
            <AttributeFormCategory>DESC</AttributeFormCategory>
            <AttributeFormType>Number</AttributeFormType>
            <AttributeFormDescription/>
            <AttributeFormReportSort>None</AttributeFormReportSort>
            <AttributeFormBrowseSort>None</AttributeFormBrowseSort>
            <AttributeLookUpTable>FACT_TABLE_NAME</AttributeLookUpTable>
            <Row>
                <SchemaExpression>TECHCOLUMNNAME</SchemaExpression>
                <MappingMethod>Manual</MappingMethod>
                <Row>
                    <SchemaCandidateTable>FACT_TABLE_NAME</SchemaCandidateTable>
                </Row>
            </Row>
            <Multilingual>False</Multilingual>
        </Row>
        <Row>
            <AttributeChild>TABLE_PK</AttributeChild>
            <AttributeChildRelationship>One to Many</AttributeChildRelationship>
            <AttributeChildTable>FACT_TABLE_NAME</AttributeChildTable>
            <Path>\Schema Objects\Attributes\FACT_TABLE_NAME\Star Attributes\_technical</Path>
        </Row>
        <Row>
            <Row>
                <AttributeBrowseDisplay>DESC</AttributeBrowseDisplay>
            </Row>
            <Row>
                <ReportDisplayForm>DESC</ReportDisplayForm>
            </Row>
            <AttributeElementDisplay>Locked</AttributeElementDisplay>
            <SecurityFliterToElementBrowsing>True</SecurityFliterToElementBrowsing>
            <EnableElementCaching>True</EnableElementCaching>
        </Row>
    </Row>
</ListPropertiesAttribute>

def traverse3(xmlelement,searchelements,dictreturn):
_d=dict()
for row in xmlelement:
    if row.getchildren():
        traverse3(row,searchelements,_d)
    else:
        dictreturn[row.tag]=row.text
    dictreturn.update(_d)
return dictreturn

这一切都很好，我得到了我想要的元组列表。输出如下所示：

    [('AttributeName',
  'B31BEF954E05B473A8D3A1B63B29F91E',
  'ID',
  'Number',
  'None',
  'None',
  'FACT_TABLE_NAME',
  'ApplySimple("nvl(#0, -2)";TECHCOLUMNNAME)',
  'Manual',
  'FACT_TABLE_NAME'),
 ('AttributeName',
  'B31BEF954E05B473A8D3A1B63B29F91E',
  'DESC',
  'Number',
  'None',
  'None',
  'FACT_TABLE_NAME',
  'TECHCOLUMNNAME',
  'Manual',
  'FACT_TABLE_NAME')]

现在我想将其形式化一点，删除重复的代码，并允许我处理其他类似但不相同的XML。我想到编写函数，它可以检查搜索元组中提供的所需标记，并希望使用字典，以便以后识别已找到的值

我的函数如下所示：

<CommandManagerResults>
<ListPropertiesAttribute>
    <Row>
        <Name>AttributeName</Name>
        <Id>B31BEF954E05B473A8D3A1B63B29F91E</Id>
        <Description>TECHCOLUMNNAME</Description>
        <LongDescription/>
        <CreationTime>26. August 2010 10:16:10 MESZ</CreationTime>
        <ModificationTime>23. November 2017 20:13:37 MEZ</ModificationTime>
        <Owner>Administrator</Owner>
        <Hidden>False</Hidden>
        <Row>
            <AttributeFormName>ID</AttributeFormName>
            <AttributeFormCategory>ID</AttributeFormCategory>
            <AttributeFormType>Number</AttributeFormType>
            <AttributeFormDescription/>
            <AttributeFormReportSort>None</AttributeFormReportSort>
            <AttributeFormBrowseSort>None</AttributeFormBrowseSort>
            <AttributeLookUpTable>FACT_TABLE_NAME</AttributeLookUpTable>
            <Row>
                <SchemaExpression>ApplySimple("nvl(#0, -2)";TECHCOLUMNNAME)</SchemaExpression>
                <MappingMethod>Manual</MappingMethod>
                <Row>
                    <SchemaCandidateTable>FACT_TABLE_NAME</SchemaCandidateTable>
                </Row>
            </Row>
            <Multilingual>Unknown</Multilingual>
        </Row>
        <Row>
            <AttributeFormName>DESC</AttributeFormName>
            <AttributeFormCategory>DESC</AttributeFormCategory>
            <AttributeFormType>Number</AttributeFormType>
            <AttributeFormDescription/>
            <AttributeFormReportSort>None</AttributeFormReportSort>
            <AttributeFormBrowseSort>None</AttributeFormBrowseSort>
            <AttributeLookUpTable>FACT_TABLE_NAME</AttributeLookUpTable>
            <Row>
                <SchemaExpression>TECHCOLUMNNAME</SchemaExpression>
                <MappingMethod>Manual</MappingMethod>
                <Row>
                    <SchemaCandidateTable>FACT_TABLE_NAME</SchemaCandidateTable>
                </Row>
            </Row>
            <Multilingual>False</Multilingual>
        </Row>
        <Row>
            <AttributeChild>TABLE_PK</AttributeChild>
            <AttributeChildRelationship>One to Many</AttributeChildRelationship>
            <AttributeChildTable>FACT_TABLE_NAME</AttributeChildTable>
            <Path>\Schema Objects\Attributes\FACT_TABLE_NAME\Star Attributes\_technical</Path>
        </Row>
        <Row>
            <Row>
                <AttributeBrowseDisplay>DESC</AttributeBrowseDisplay>
            </Row>
            <Row>
                <ReportDisplayForm>DESC</ReportDisplayForm>
            </Row>
            <AttributeElementDisplay>Locked</AttributeElementDisplay>
            <SecurityFliterToElementBrowsing>True</SecurityFliterToElementBrowsing>
            <EnableElementCaching>True</EnableElementCaching>
        </Row>
    </Row>
</ListPropertiesAttribute>

def traverse3(xmlelement,searchelements,dictreturn):
_d=dict()
for row in xmlelement:
    if row.getchildren():
        traverse3(row,searchelements,_d)
    else:
        dictreturn[row.tag]=row.text
    dictreturn.update(_d)
return dictreturn

当时的预期用途是：

from lxml import etree
root = etree.parse("some.xml")
l = []
tags = ('Name', 'Id', 'AttributeFormName', 'AttributeFormType', 'AttributeFormReportSort', 'AttributeFormBrowseSort', 'AttributeLookUpTable', 'SchemaExpression', 'MappingMethod','SchemaCandidateTable')
d = {}
l.append(traverse3(elem,tags,d))

我只拿回了“最后一张”的唱片，这肯定是因为我错过了在某处添加新的dict或者更早地返回它，或者我错过了其他任何东西

[{'Name': 'AttributeName',
  'Id': 'B31BEF954E05B473A8D3A1B63B29F91E',
  'Description': 'TECHCOLUMNNAME',
  'AttributeFormName': 'DESC',
  'AttributeFormType': 'Number',
  'AttributeFormReportSort': 'None',
  'AttributeFormBrowseSort': 'None',
  'AttributeLookUpTable': 'FACT_TABLE_NAME',
  'SchemaExpression': 'TECHCOLUMNNAME',
  'MappingMethod': 'Manual',
  'SchemaCandidateTable': 'FACT_TABLE_NAME']

在我添加了一些打印之后，我可以看到我想要的记录（带有ID表单的记录）在我的递归调用期间就在那里，但是它会被另一条类似于DESC表单的记录覆盖——当然，我也想要这条记录。我添加了一些功能，试图减少我的searchtag列表，使其具有某种退出标准，但所有这样做的尝试（甚至在返回中移动）都以一些“非类型不可接受”结束

我真的很想知道一些想法/方向

提前为这个史诗般的问题/例子道歉。

不确定回答我自己的问题是否被视为良好做法。同时，我找到了解决方案，通过设置/启动我感兴趣的最深元素并迭代该元素的祖先，我反转了处理过程

    def reverslookup(xmlelement,searchtags):
    d={}
    d[xmlelement.tag] = xmlelement.text
    for parent in xmlelement.iterancestors():
        if parent.tag == "Row":
            for elem in parent:
                if elem.tag in searchtags:
                    d[elem.tag] = elem.text
    return d


if __name__ == "__main__":
    from lxml import etree
    root = etree.parse("file.xml")
    tags = ('Name', 'Id', 'AttributeFormName', 'AttributeFormType', 'AttributeFormReportSort', 'AttributeFormBrowseSort', 'AttributeLookUpTable', 'SchemaExpression', 'MappingMethod','SchemaCandidateTable')
    l=[]
    for start in root.findall(".//*/SchemaCandidateTable"):
        l.append(reverslookup(start,tags))

通过这种方式，我可以获得上面的元素，而不必考虑xml文件中潜在的重复标记。它为每个记录提供了所需的词典列表：

Out[6]: 

[{'SchemaCandidateTable': 'FACT_TABLE_NAME',
  'SchemaExpression': 'ApplySimple("nvl(#0, -2)";TECHCOLUMNNAME)',
  'MappingMethod': 'Manual',
  'AttributeFormName': 'ID',
  'AttributeFormType': 'Number',
  'AttributeFormReportSort': 'None',
  'AttributeFormBrowseSort': 'None',
  'AttributeLookUpTable': 'FACT_TABLE_NAME',
  'Name': 'AttributeName',
  'Id': 'B31BEF954E05B473A8D3A1B63B29F91E'},
 {'SchemaCandidateTable': 'FACT_TABLE_NAME',
  'SchemaExpression': 'TECHCOLUMNNAME',
  'MappingMethod': 'Manual',
  'AttributeFormName': 'DESC',
  'AttributeFormType': 'Number',
  'AttributeFormReportSort': 'None',
  'AttributeFormBrowseSort': 'None',
  'AttributeLookUpTable': 'FACT_TABLE_NAME',
  'Name': 'AttributeName',
  'Id': 'B31BEF954E05B473A8D3A1B63B29F91E'}]

是的，你当然可以！作为一名新成员，您必须等待一段时间才能将其标记为“已接受”（这表示您的问题有一个有效的答案），而且可能还会有另一个更好的答案。但这看起来不错。很高兴知道，谢谢你指出这一点@usr2564301。这里是新的（以前只是被动阅读），似乎各种各样的问题都可能发生（因为回答了一个重复的问题而被否决）。还感谢您对答案/解决方案本身的评论。