Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/apache/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 拆分并打印xml以存储在列表中_Python_Lxml - Fatal编程技术网

Python 拆分并打印xml以存储在列表中

Python 拆分并打印xml以存储在列表中,python,lxml,Python,Lxml,我有一个名为Books.xml的文件 Books.xml是巨大的2Gb,其结构与此类似 <Books> <Book> <Detail ID="67"> <BookName>Code Complete 2</BookName> <Author>Steve McConnell</Author> <Pages>9

我有一个名为Books.xml的文件 Books.xml是巨大的2Gb,其结构与此类似

<Books>
    <Book>
        <Detail ID="67">
            <BookName>Code Complete 2</BookName>
            <Author>Steve McConnell</Author>
            <Pages>960</Pages>
            <ISBN>0735619670</ISBN>        
            <BookName>Application Architecture Guide 2</BookName>
            <Author>Microsoft Team</Author>
            <Pages>496</Pages>
            <ISBN>073562710X</ISBN>
        </Detail>
    </Book>
    <Book>
        <Detail ID="87">
            <BookName>Rocking Python</BookName>
            <Author>Guido Rossum</Author>
            <Pages>960</Pages>
            <ISBN>0735619690</ISBN>
            <BookName>Python Rocks</BookName>
            <Author>Microsoft Team</Author>
            <Pages>496</Pages>
            <ISBN>073562710X</ISBN>
        </Detail>
    </Book>
</Books>
我得到的结果是这样的

import xml.etree.cElementTree as etree
filename = r'D:\test\Books.xml'
context = iter(etree.iterparse(filename, events=('start', 'end')))
_, root = next(context)
for event, elem in context:
    if event == 'start' and elem.tag == 'Book':
        print(etree.dump(elem))
        root.clear()
<Book>
        <Detail ID="67">
            <BookName>Code Complete 2</BookName>
            <Author>Steve McConnell</Author>
            <Pages>960</Pages>
            <ISBN>0735619670</ISBN>
            <BookName>Application Architecture Guide 2</BookName>
            <Author>Microsoft Team</Author>
            <Pages>496</Pages>
            <ISBN>073562710X</ISBN>
        </Detail>
    </Book>

None
<Book>
        <Detail ID="87">
            <BookName>Rocking Python</BookName>
            <Author>Guido Rossum</Author>
            <Pages>960</Pages>
            <ISBN>0735619690</ISBN>
            <BookName>Python Rocks</BookName>
            <Author>Microsoft Team</Author>
            <Pages>496</Pages>
            <ISBN>073562710X</ISBN>
        </Detail>
    </Book>
None

代码完成2
史蒂夫·迈克康奈尔
960
0735619670
应用程序架构指南2
微软团队
496
073562710X
没有一个
摇摆巨蟒
吉多罗森
960
0735619690
巨蟒岩
微软团队
496
073562710X
没有一个
  • 我如何摆脱
  • 我想把书上破碎的碎片分类存放 然后让另一个程序将其出列

  • 以下是如何使用进行进程间排队以及操作、序列化和漂亮打印给定xml的方法:

    #tasks.py file
    from lxml import etree
    from celery import Celery
    
    app = Celery('tasks', broker='amqp://guest@localhost//')
    
    @app.task
    def print_book(book_xml):
        book = etree.fromstring(book_xml)
        # do something interesting ...
        print(etree.tostring(book, pretty_print=True))
    
    #caller.py file
    from tasks import print_book
    from lxml import etree
    
    for _, book in etree.iterparse('Books.xml', tag="Book"):
        book_xml = etree.tostring(book)
        print_book.delay(book_xml)
    

    你好,Karoly,谢谢,但我需要打印我提取的片段,而不是从磁盘读取的整个文档我会查找芹菜,谢谢,但我的文件是1Gb,我不能使用etree.parseright,答案更新,另请看《魔兽世界》中的神奇人物。介意解释一下print_book.delay在那里做什么吗?它将函数调用作为一条消息放入队列中,供芹菜工人稍后使用,请按照芹菜教程了解更多信息