Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/348.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从docx(word文件)中读取号码列表_Python_Xml_Ms Word - Fatal编程技术网

Python 如何从docx(word文件)中读取号码列表

Python 如何从docx(word文件)中读取号码列表,python,xml,ms-word,Python,Xml,Ms Word,如何从docx(word文件)中读取号码列表 bulletsquestions.docx: 1. this is a question text A. Option first B. Option second C. Option third D. Option fourth E. Option fifth import zipfile from xml.etree.ElementTree import XML sourceFile =

如何从docx(word文件)中读取号码列表

bulletsquestions.docx:

  1. this is a question text 
    A.  Option first
    B.  Option second
    C.  Option third
    D.  Option fourth
    E.  Option fifth
 import zipfile
from xml.etree.ElementTree import XML
sourceFile = zipfile.ZipFile('bulletsquestions.docx')
xml_content = sourceFile.read('word/document.xml')
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
tree = XML(xml_content)
tex=""
for paragraph in tree.getiterator(PARA):
    for read_item in paragraph.getiterator(TEXT):
        tex=tex+read_item.text
print(tex)
stack.py:

  1. this is a question text 
    A.  Option first
    B.  Option second
    C.  Option third
    D.  Option fourth
    E.  Option fifth
 import zipfile
from xml.etree.ElementTree import XML
sourceFile = zipfile.ZipFile('bulletsquestions.docx')
xml_content = sourceFile.read('word/document.xml')
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
tree = XML(xml_content)
tex=""
for paragraph in tree.getiterator(PARA):
    for read_item in paragraph.getiterator(TEXT):
        tex=tex+read_item.text
print(tex)
结果:

  1. this is a question text 
    A.  Option first
    B.  Option second
    C.  Option third
    D.  Option fourth
    E.  Option fifth
 import zipfile
from xml.etree.ElementTree import XML
sourceFile = zipfile.ZipFile('bulletsquestions.docx')
xml_content = sourceFile.read('word/document.xml')
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
tree = XML(xml_content)
tex=""
for paragraph in tree.getiterator(PARA):
    for read_item in paragraph.getiterator(TEXT):
        tex=tex+read_item.text
print(tex)
  • 这是一个问题文本选项第一选项第二选项第三选项第四选项第五选项

  • 您应该解释输出应该是什么样子。我自己是第一个选择。选项二C。选项三。选择四。选项fifth@pjrocks抱歉,我不知道该文件的XML模式。您能打印您提取的文件的示例XML吗?