Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/344.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中从docx获取文本_Python_Input_Ms Word_Python Docx - Fatal编程技术网

在python中从docx获取文本

在python中从docx获取文本,python,input,ms-word,python-docx,Python,Input,Ms Word,Python Docx,如何从python中的docx文件中获取文本?最好将其导入到一个简单的字符串中。显然,可以忽略原始文件中的格式设置 我了解docx文件(一个文件夹,其中文本保存为document.xml)的结构,但我想要一种简单的方法来提取文本,而不必手动打开该文件夹、提取文件和提取段落标记 我已尝试(按照),但每次都会出错: import docx as dx document = dx.opendocx('files/file.docx') Traceback (most recent call last

如何从python中的docx文件中获取文本?最好将其导入到一个简单的字符串中。显然,可以忽略原始文件中的格式设置

我了解docx文件(一个文件夹,其中文本保存为
document.xml
)的结构,但我想要一种简单的方法来提取文本,而不必手动打开该文件夹、提取文件和提取段落标记

我已尝试(按照),但每次都会出错:

import docx as dx
document = dx.opendocx('files/file.docx')

Traceback (most recent call last):
  File "concord.py", line 2, in <module>
    document = dx.opendocx('files/#n01 ch B3A126.docx')
AttributeError: 'module' object has no attribute 'opendocx'
将docx导入为dx
document=dx.opendocx('files/file.docx')
回溯(最近一次呼叫最后一次):
文件“concord.py”,第2行,在
document=dx.opendocx('files/#n01 ch B3A126.docx'))
AttributeError:“模块”对象没有属性“opendocx”

当前目录中是否有名为
docx.py
的文件?不,当前工作目录中没有
docx.py
。然而,
pythondocx
github发行版中有这样一个文件。要安装它,我所做的只是将它解压缩到一个随机文件夹(后来我删除了该文件夹),然后运行
python setup.py install
。希望没问题?如果在导入后立即放入
dir(dx)
,您会得到什么?如果我在iPython中这样做,我会得到:
Out[2]:['AdvSearch'、'Image'、'uuuuuu builtins'、'uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuragraph、picture、re、relationshiplist、replace、savedocx、search、shutil、table、template_dir、time、websettings、wordrelationships、zipfile']
如果我像原来那样,在文本编辑器(notepad++)中,使用
python filename.py从命令行运行文件,我会得到
[''''''''''''''''''''''''.'文档''.'文件'.'''.'名称'.''.'''.'包'.''.'.'路径'.'.'.'.'.'.''.''.''.''.'.''.''.'.''.'''.'.'.''''.'.'.'''.'.'.''''.'''''.'.'''.'.'.'''.''。。