Python/lxml：嵌套for循环_Python_For Loop_Lxml_Nested

Python/lxml：嵌套for循环

python for-loop

Python/lxml：嵌套for循环,python,for-loop,lxml,nested,Python,For Loop,Lxml,Nested,我有一些XML正在尝试解析。例如： <TVAMain> <ProgramDescription> <ProgramLocationTable> <Schedule value1="1234"> <ScheduleEvent> <Program value2="1234567890" />

我有一些XML正在尝试解析。例如：

<TVAMain>
    <ProgramDescription>
        <ProgramLocationTable>
            <Schedule value1="1234">
                <ScheduleEvent>
                    <Program value2="1234567890" />
                </ScheduleEvent>
                <ScheduleEvent>
                    <Program value2="1234567891" />
                </ScheduleEvent>
            </Schedule>
            <Schedule value1="5678">
                <ScheduleEvent>
                    <Program value2="1234567892" />
                </ScheduleEvent>
                <ScheduleEvent>
                    <Program value2="1234567893" />
                </ScheduleEvent>
            </Schedule>
        </ProgramLocationTable>
    </ProgramDescription>
</TVAMain>

此代码将成功打印所有“value1”值，但不打印value2

我尝试了以下方法： -在第二个for循环中使用“info2” -使用第二个xpath，输入value1的已知值

有人能给我指出正确的方向吗？

使用您发布的XML，您可以通过一个XPath找到所有值：

import lxml.etree as ET

tree = ET.parse('data')
tree.xpath('//Schedule')  

values = tree.xpath('//Schedule/@value1 | //Schedule/ScheduleEvent/Program/@value2')
for vals in zip(*[iter(values)]*3):
    print(vals)

印刷品

('1234', '1234567890', '1234567891')
('5678', '1234567892', '1234567893')

此XPath假定始终有一个

value1

属性，后跟两个

value2

属性。如果您不想依赖该假设，那么您可以通过以下方式进行循环：

for schedule in tree.xpath('//Schedule[@value1]'):
    value1 = schedule.get('value1')
    print(value1)
    for value2 in schedule.xpath('ScheduleEvent/Program/@value2'):
        print(value2)

在代码中：

root.xpath('//xmlns:Schedule[@value1 = "value1"]/ScheduleEvent/Program', namespaces=nsmap)

无法工作，因为

“value1”

是一个文本字符串。您需要将其替换为变量

value1

：

'//xmlns:Schedule[@value1 = "{v}"]/ScheduleEvent/Program'.format(v=value1)

尽管这样做可行，但指定

值1

可能比您需要的更具体。或者，如果两个

Schedule

元素具有相同的

value1

属性，则可能不够具体。相反，您可以通过调用

schedule.xpath

，找到子

Program

元素：

schedule.xpath('ScheduleEvent/Program/@value2')

而不是使用

tree.xpath从树的顶部重新开始
 同样使用lxml
的另一种方法是：
import lxml.etree as et

message = """<?xml version="1.0" encoding="UTF-8"?>       
<TVAMain>                                                 
    <ProgramDescription>                                  
        <ProgramLocationTable>                            
            <Schedule value1="1234">                      
                <ScheduleEvent>                           
                    <Program value2="1234567890" />       
                </ScheduleEvent>                          
                <ScheduleEvent>                           
                    <Program value2="1234567891" />       
                </ScheduleEvent>                          
            </Schedule>                                   
            <Schedule value1="5678">                      
                <ScheduleEvent>                           
                    <Program value2="1234567892" />       
                </ScheduleEvent>                          
                <ScheduleEvent>                           
                    <Program value2="1234567893" />       
                </ScheduleEvent>                          
            </Schedule>                                   
        </ProgramLocationTable>                           
    </ProgramDescription>                                 
</TVAMain>"""

tree = et.fromstring(message)
schedules = tree.xpath("ProgramDescription/ProgramLocationTable")[0].findall("Schedule")
for schedule in schedules:
    for event in schedule.findall("ScheduleEvent"):
        program = event.find("Program")
        print schedule.attrib["value1"],program.attrib["value2"]

是的，对不起。这是一个打字错误，看起来是一个整洁的解决方案，但并没有给我我想要的。我希望将每个“value2”放在一个单独的行上，以便在导入Excel时更易于操作。我将看一看上面ebarr的解决方案，并尝试使其正常工作。无论如何，谢谢-我想这会在以后有用的！如果你不想要的话，你不需要使用石斑鱼食谱，zip（*[iter（values）]*3）
。只需打印出值
。我得到一个错误：schedules=tree.find（“ProgramLocationTable”）。findall（“Schedule”）AttributeError:'NoneType'对象没有属性'findall'。它肯定存在于XML中。有什么想法吗？这可能意味着tree.find（“ProgramLocationTable”）
正在返回None。您使用的XML与您发布的不同吗？我尝试了你的例子，效果很好。我的XML结构存在差异。我省略了顶级标签“TVAMain”。我做了一个测试，将其添加到示例中会破坏我上面发布的文件。为什么会这样？既然ProgramLocationTable不是最高级的，为什么不工作呢？我收到了相同的错误消息。澄清一下：实际的XML中已经有了“TVAMain”（导致错误）。然而，我上面的示例缺少它，所以您的示例很有用。现在，我已经更新了示例，它出现了我所看到的错误（schedules=tree.find（“ProgramLocationTable”）.findall（“Schedule”）AttributeError:“NoneType”对象没有属性“findall”。）@unutbu-你们中的任何一个都有解决这个问题的指针吗？额外的“顶级”水平打破了它，我不明白为什么。有没有办法将查找的“根”设置为“ProgramDescription”级别？
import lxml.etree as et

message = """<?xml version="1.0" encoding="UTF-8"?>       
<TVAMain>                                                 
    <ProgramDescription>                                  
        <ProgramLocationTable>                            
            <Schedule value1="1234">                      
                <ScheduleEvent>                           
                    <Program value2="1234567890" />       
                </ScheduleEvent>                          
                <ScheduleEvent>                           
                    <Program value2="1234567891" />       
                </ScheduleEvent>                          
            </Schedule>                                   
            <Schedule value1="5678">                      
                <ScheduleEvent>                           
                    <Program value2="1234567892" />       
                </ScheduleEvent>                          
                <ScheduleEvent>                           
                    <Program value2="1234567893" />       
                </ScheduleEvent>                          
            </Schedule>                                   
        </ProgramLocationTable>                           
    </ProgramDescription>                                 
</TVAMain>"""

tree = et.fromstring(message)
schedules = tree.xpath("ProgramDescription/ProgramLocationTable")[0].findall("Schedule")
for schedule in schedules:
    for event in schedule.findall("ScheduleEvent"):
        program = event.find("Program")
        print schedule.attrib["value1"],program.attrib["value2"]

1234 1234567890
1234 1234567891
5678 1234567892
5678 1234567893