Python 如何从docx(word文件)中读取号码列表
如何从docx(word文件)中读取号码列表 bulletsquestions.docx:Python 如何从docx(word文件)中读取号码列表,python,xml,ms-word,Python,Xml,Ms Word,如何从docx(word文件)中读取号码列表 bulletsquestions.docx: 1. this is a question text A. Option first B. Option second C. Option third D. Option fourth E. Option fifth import zipfile from xml.etree.ElementTree import XML sourceFile =
1. this is a question text
A. Option first
B. Option second
C. Option third
D. Option fourth
E. Option fifth
import zipfile
from xml.etree.ElementTree import XML
sourceFile = zipfile.ZipFile('bulletsquestions.docx')
xml_content = sourceFile.read('word/document.xml')
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
tree = XML(xml_content)
tex=""
for paragraph in tree.getiterator(PARA):
for read_item in paragraph.getiterator(TEXT):
tex=tex+read_item.text
print(tex)
stack.py:
1. this is a question text
A. Option first
B. Option second
C. Option third
D. Option fourth
E. Option fifth
import zipfile
from xml.etree.ElementTree import XML
sourceFile = zipfile.ZipFile('bulletsquestions.docx')
xml_content = sourceFile.read('word/document.xml')
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
tree = XML(xml_content)
tex=""
for paragraph in tree.getiterator(PARA):
for read_item in paragraph.getiterator(TEXT):
tex=tex+read_item.text
print(tex)
结果:
1. this is a question text
A. Option first
B. Option second
C. Option third
D. Option fourth
E. Option fifth
import zipfile
from xml.etree.ElementTree import XML
sourceFile = zipfile.ZipFile('bulletsquestions.docx')
xml_content = sourceFile.read('word/document.xml')
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
tree = XML(xml_content)
tex=""
for paragraph in tree.getiterator(PARA):
for read_item in paragraph.getiterator(TEXT):
tex=tex+read_item.text
print(tex)
您应该解释输出应该是什么样子。我自己是第一个选择。选项二C。选项三。选择四。选项fifth@pjrocks抱歉,我不知道该文件的XML模式。您能打印您提取的文件的示例XML吗?