使用pythondocx组合word文档
我有几个word文件,每个文件都有特定的内容。我想要一个代码片段,它向我展示或帮助我了解如何在使用Python使用pythondocx组合word文档,python,python-2.7,python-docx,Python,Python 2.7,Python Docx,我有几个word文件,每个文件都有特定的内容。我想要一个代码片段,它向我展示或帮助我了解如何在使用Pythondocxlibrary时将word文件合并到一个文件中 例如,在pywin32库中,我执行了以下操作: rng = self.doc.Range(0, 0) for d in data: time.sleep(0.05) docstart = d.wordDoc.Content.Start self.word.Visible = True docend
docx
library时将word文件合并到一个文件中
例如,在pywin32库中,我执行了以下操作:
rng = self.doc.Range(0, 0)
for d in data:
time.sleep(0.05)
docstart = d.wordDoc.Content.Start
self.word.Visible = True
docend = d.wordDoc.Content.End - 1
location = d.wordDoc.Range(docstart, docend).Copy()
rng.Paste()
rng.Collapse(0)
rng.InsertBreak(win32.constants.wdPageBreak)
但是我需要在使用Python
docx
库而不是win32.client
时执行此操作。如果您的需求很简单,类似这样的操作可能会奏效:
source_document = Document('source.docx')
target_document = Document()
for paragraph in source_document.paragraphs:
text = paragraph.text
target_document.add_paragraph(text)
你还可以做一些额外的事情,但这应该让你开始
事实证明,在一般情况下,将内容从一个Word文件复制到另一个Word文件是相当复杂的,例如,需要协调源文档中存在的样式,而目标文档中可能存在冲突。因此,这不是我们明年可能要添加的功能,比如说。创建一个空文档(empty.docx)并将您的两个文档添加到此文档中。
在文件的每个迭代循环中,如果需要,添加一个分页符
完成后保存包含两个组合文件的新文件
from docx import Document
files = ['file1.docx', 'file2.docx']
def combine_word_documents(files):
combined_document = Document('empty.docx')
count, number_of_files = 0, len(files)
for file in files:
sub_doc = Document(file)
# Don't add a page break if you've
# reached the last file.
if count < number_of_files - 1:
sub_doc.add_page_break()
for element in sub_doc._document_part.body._element:
combined_document._document_part.body._element.append(element)
count += 1
combined_document.save('combined_word_documents.docx')
combine_word_documents(files)
来自docx导入文档
files=['file1.docx','file2.docx']
def合并word文档(文件):
组合文档=文档('empty.docx')
计数,文件数=0,len(文件)
对于文件中的文件:
子文档=文档(文件)
#如果您已经添加了分页符,请不要添加分页符
#已到达最后一个文件。
如果计数<文件的数量\u-1:
子文档添加页面中断()
对于子文档\文档\零件体\元素中的元素:
组合文档。\u文档\u部分。正文。\u元素。追加(元素)
计数+=1
组合文档.save('combined\u word\u documents.docx'))
合并word文档(文件)
如果您只需要将简单文档与文本结合起来,可以使用上面提到的python docx
如果需要合并包含超链接、图像、列表、项目符号等的文档,可以使用lxml将文档正文和所有参考文件合并,如:
- word/styles.xml
- word/number.xml
- 文字/媒体
- [内容类型].xml
等等。我已经调整了上面的示例,以使用最新版本的(撰写本文时为0.8.6)。请注意,这只是复制元素(合并元素样式更为复杂):
来自docx导入文档
files=['file1.docx','file2.docx']
def合并word文档(文件):
合并的文档=文档()
对于索引,枚举中的文件(文件):
子文档=文档(文件)
#如果已到达最后一个文件,请不要添加分页符。
如果索引
这些都非常有用。我综合了Martijn Jacobs和Kris先生的答案
def combine_word_documents(input_files):
"""
:param input_files: an iterable with full paths to docs
:return: a Document object with the merged files
"""
for filnr, file in enumerate(input_files):
# in my case the docx templates are in a FileField of Django, add the MEDIA_ROOT, discard the next 2 lines if not appropriate for you.
if 'offerte_template' in file:
file = os.path.join(settings.MEDIA_ROOT, file)
if filnr == 0:
merged_document = Document(file)
merged_document.add_page_break()
else:
sub_doc = Document(file)
# Don't add a page break if you've reached the last file.
if filnr < len(input_files)-1:
sub_doc.add_page_break()
for element in sub_doc.element.body:
merged_document.element.body.append(element)
return merged_document
def合并单词文档(输入文件):
"""
:param input_files:具有文档完整路径的iterable
:return:包含合并文件的文档对象
"""
对于filnr,枚举中的文件(输入文件):
#在我的例子中,docx模板位于Django的文件字段中,添加MEDIA_ROOT,如果不适合您,则丢弃接下来的2行。
如果文件中有“报价模板”:
file=os.path.join(settings.MEDIA\u根目录,文件)
如果filnr==0:
合并的文档=文档(文件)
合并的文档。添加页面分割()
其他:
子文档=文档(文件)
#如果已到达最后一个文件,请不要添加分页符。
如果filnr
合并两个文档(包括所有样式)的另一种方法是使用python库docxcompose()。我们不需要明确定义样式,也不需要逐段阅读文档并将其附加到主文档中。python docxcompose的用法如下面的代码所示
#Importing the required packages
from docxcompose.composer import Composer
from docx import Document as Document_compose
#filename_master is name of the file you want to merge the docx file into
master = Document_compose(filename_master)
composer = Composer(master)
#filename_second_docx is the name of the second docx file
doc2 = Document_compose(filename_second_docx)
#append the doc2 into the master using composer.append function
composer.append(doc2)
#Save the combined docx with a name
composer.save("combined.docx")
如果要将多个文档合并到一个docx文件中,可以使用以下功能
#Filename_master is the name of the file you want to merge all the document into
#files_list is a list containing all the filename of the docx file to be merged
def combine_all_docx(filename_master,files_list):
number_of_sections=len(files_list)
master = Document_compose(filename_master)
composer = Composer(master)
for i in range(0, number_of_sections):
doc_temp = Document_compose(files_list[i])
composer.append(doc_temp)
composer.save("combined_file.docx")
#For Example
#filename_master="file1.docx"
#files_list=["file2.docx","file3.docx","file4.docx",file5.docx"]
#Calling the function
#combine_all_docx(filename_master,files_list)
#This function will combine all the document in the array files_list into the file1.docx and save the merged document into combined_file.docx
另一个替代解决方案是。它根据ImportFormatMode参数保留文档的格式/样式。该参数定义将使用的格式:附加文档或目标文档。可能的值为KeepSourceFormat或UseDestinationStyles
# For complete examples and data files, please go to https://github.com/aspose-words-cloud/aspose-words-cloud-python
import os
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile
# Please get your Client ID and Secret from https://dashboard.aspose.cloud.
client_id='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx'
client_secret='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
words_api = asposewordscloud.WordsApi(client_id,client_secret)
words_api.api_client.configuration.host='https://api.aspose.cloud'
remoteFolder = 'Temp'
localFolder = 'C:/Temp'
localFileName = 'destination.docx'
remoteFileName = 'destination.docx'
localFileName1 = 'source.docx'
remoteFileName1 = 'source.docx'
#upload file
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName,'rb'),remoteFolder + '/' + remoteFileName))
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName1,'rb'),remoteFolder + '/' + remoteFileName1))
#append Word documents
requestDocumentListDocumentEntries0 = asposewordscloud.DocumentEntry(href=remoteFolder + '/' + remoteFileName1, import_format_mode='KeepSourceFormatting')
requestDocumentListDocumentEntries = [requestDocumentListDocumentEntries0]
requestDocumentList = asposewordscloud.DocumentEntryList(document_entries=requestDocumentListDocumentEntries)
request = asposewordscloud.models.requests.AppendDocumentRequest(name=remoteFileName, document_list=requestDocumentList, folder=remoteFolder, dest_file_name= remoteFolder + '/' + remoteFileName)
result = words_api.append_document(request)
#download file
request_download=asposewordscloud.models.requests.DownloadFileRequest(remoteFolder + '/' + remoteFileName)
response_download = words_api.download_file(request_download)
copyfile(response_download, localFolder + '/' +"Append_output.docx")
我在@abarnert再次写下了这个问题。重新写下的问题看起来很有答案。谢谢你@omri_saadon@AdamSmith:可以回答,是的,但现在他要求我们将他的代码从一个库移植到另一个库,这仍然不适合这样做。特别是因为他没有展示任何docx代码,也没有描述他走了多远,以及除了最模糊的术语之外他在哪里遇到了麻烦。我不知道怎么做,我的想法是浏览每个文档(浏览段落和表格),然后以某种方式将其复制到新的word文件中。即使你知道怎么做,我也会很高兴的。我对这个图书馆熟悉了几天@abarnert–它还会复制表格吗@斯坎尼诺。请参阅本页了解与此相关的一些讨论:我成功地将所有内容复制到新的docx文件,但所有格式都消失了(例如,粗体)。有没有办法保存它们?嗯,就像我说的,解决一般情况下的问题是复杂的。您可能会通过进入运行级别并在那里匹配粗体和斜体来取得一些进展。每个段落都由运行(一级近似)组成,字符格式在运行级别。AttributeError:“Document”对象没有属性“\u Document\u part”?@coachcal
\u Document\u part
是“私有”的,不应作为API访问。无论如何,这是一个新版本
# For complete examples and data files, please go to https://github.com/aspose-words-cloud/aspose-words-cloud-python
import os
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile
# Please get your Client ID and Secret from https://dashboard.aspose.cloud.
client_id='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx'
client_secret='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
words_api = asposewordscloud.WordsApi(client_id,client_secret)
words_api.api_client.configuration.host='https://api.aspose.cloud'
remoteFolder = 'Temp'
localFolder = 'C:/Temp'
localFileName = 'destination.docx'
remoteFileName = 'destination.docx'
localFileName1 = 'source.docx'
remoteFileName1 = 'source.docx'
#upload file
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName,'rb'),remoteFolder + '/' + remoteFileName))
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName1,'rb'),remoteFolder + '/' + remoteFileName1))
#append Word documents
requestDocumentListDocumentEntries0 = asposewordscloud.DocumentEntry(href=remoteFolder + '/' + remoteFileName1, import_format_mode='KeepSourceFormatting')
requestDocumentListDocumentEntries = [requestDocumentListDocumentEntries0]
requestDocumentList = asposewordscloud.DocumentEntryList(document_entries=requestDocumentListDocumentEntries)
request = asposewordscloud.models.requests.AppendDocumentRequest(name=remoteFileName, document_list=requestDocumentList, folder=remoteFolder, dest_file_name= remoteFolder + '/' + remoteFileName)
result = words_api.append_document(request)
#download file
request_download=asposewordscloud.models.requests.DownloadFileRequest(remoteFolder + '/' + remoteFileName)
response_download = words_api.download_file(request_download)
copyfile(response_download, localFolder + '/' +"Append_output.docx")