Python docx在保持样式的同时替换段落中的字符串

Python docx在保持样式的同时替换段落中的字符串,python,python-2.7,python-docx,python3,python-docx-template,Python,Python 2.7,Python Docx,Python3,Python Docx Template,我需要帮助替换word文档中的字符串,同时保留整个文档的格式 我使用的是PythonDocx,在阅读了文档之后,它可以处理整个段落,所以我放松了格式,比如粗体或斜体字。 包括要替换的文本是粗体的,我希望保持这种方式。 我正在使用以下代码: from docx import Document def replace_string2(filename): doc = Document(filename) for p in doc.paragraphs: if 'Tex

我需要帮助替换word文档中的字符串,同时保留整个文档的格式

我使用的是PythonDocx,在阅读了文档之后,它可以处理整个段落,所以我放松了格式,比如粗体或斜体字。 包括要替换的文本是粗体的,我希望保持这种方式。 我正在使用以下代码:

from docx import Document
def replace_string2(filename):
    doc = Document(filename)
    for p in doc.paragraphs:
        if 'Text to find and replace' in p.text:
            print 'SEARCH FOUND!!'
            text = p.text.replace('Text to find and replace', 'new text')
            style = p.style
            p.text = text
            p.style = style
    # doc.save(filename)
    doc.save('test.docx')
    return 1
因此,如果我实现它并希望类似(包含要替换的字符串的段落将丢失其格式):

这是第1段,这是粗体的文本

这是第2段,我将替换旧文本

目前的结果是:

这是第1段,这是粗体的文本

这是第2段,我将替换新的文本

我发布了这个问题(尽管我在这里看到了一些相同的问题),因为(据我所知)这些问题都没有解决这个问题。有一个使用oodocx库,我尝试过,但没有成功。所以我找到了一个解决办法

代码非常相似,但逻辑是:当我找到包含我希望替换的字符串的段落时,使用runs添加另一个循环。 (仅当我希望替换的字符串具有相同的格式时,此操作才有效)


这就是我在替换文本时保留文本样式的方法

基于
Alo
的答案以及搜索文本可以分多次运行的事实,下面是我用来替换模板docx文件中占位符文本的方法。它检查所有文档段落和任何表格单元格内容中的占位符

在段落中找到搜索文本后,它会循环遍历其运行,确定哪些运行包含搜索文本的部分文本,然后在第一次运行中插入替换文本,然后在其余运行中清空剩余的搜索文本字符

我希望这对某人有帮助。如果有人想改进它,这里有个建议

编辑: 我随后发现了
python docx模板
,它允许在docx模板中使用jinja2样式的模板。这里有一个链接到

def docx_替换(文档、数据):
段落=列表(文件段落)
对于doc.tables中的t:
对于t.行中的行:
对于row.cells中的单元格:
对于单元格中的段落。段落:
段落.附加(段落)
对于段落中的p:
对于键,data.items()中的val:
key_name='${{}'.format(key)#我正在使用${placeholder name}形式的占位符
如果p.text中的键名称:
内联=p.runs
#替换字符串并保留相同的样式。
#要替换的文本可以拆分为多段,以便
#搜索,确定哪些运行需要替换文本
#然后替换已识别的文本中的文本
开始=错误
关键字索引=0
#found_runs是一个列表(内联索引、匹配索引、匹配长度)
找到\u runs=list()
发现所有=错误
replace_done=False
对于范围内的i(len(inline)):
#案例1:在单次运行中发现短路,请更换
如果内联[i].text中的键_名称未启动:
find_runs.append((i,内联[i].text.find(键名称),len(键名称)))
text=inline[i].text.replace(键名,str(val))
内联[i]。文本=文本
替换_done=True
找到所有=真
打破
如果键名[key\u index]不在内联[i]中,则为文本且未启动:
#继续找。。。
持续
#案例2:搜索部分文本,查找第一次运行
如果内联[i].text中的key\u name[key\u index]和内联[i].text[-1]中的key\u name未启动:
#检查顺序
start\u index=inline[i].text.find(key\u name[key\u index])
检查_length=len(内联[i].text)
对于范围内的文本索引(开始索引,检查长度):
如果内联[i]。文本[text_index]!=关键字名称[关键字索引]:
#没有匹配项,因此必须为假阳性
打破
如果key_index==0:
开始=真
chars\u found=检查长度-开始索引
键索引+=找到字符
查找\u runs.append((i,开始\u索引,查找字符))
如果关键字索引!=len(钥匙名称):
持续
其他:
#在key_name中找到所有字符
找到所有=真
打破
#案例2:搜索部分文本,查找后续运行
如果内联[i]中的键名称[key\u index],则文本和已启动但未找到所有键:
#检查顺序
找到的字符数=0
检查_length=len(内联[i].text)
对于范围(0,检查长度)内的文本索引:
如果内联[i].text[text\u index]==键名称[key\u index]:
键索引+=1
找到的字符数+=1
其他:
打破
#没有比赛,所以必须结束
已找到\u运行。追加((i,0,已找到字符))
如果键索引==len(键名称):
找到所有=真
def replace_string(filename):
    doc = Document(filename)
    for p in doc.paragraphs:
        if 'old text' in p.text:
            inline = p.runs
            # Loop added to work with runs (strings with same style)
            for i in range(len(inline)):
                if 'old text' in inline[i].text:
                    text = inline[i].text.replace('old text', 'new text')
                    inline[i].text = text
            print p.text

    doc.save('dest1.docx')
    return 1
from docx import Document

document = Document('old.docx')

dic = {'name':'ahmed','me':'zain'}
for p in document.paragraphs:
    inline = p.runs
    for i in range(len(inline)):
        text = inline[i].text
        if text in dic.keys():
            text=text.replace(text,dic[text])
            inline[i].text = text

document.save('new.docx')
'''
-*- coding: utf-8 -*-
@Time    : 2021/4/19 13:13
@Author  : ZCG
@Site    : 
@File    : Batch DOCX document keyword replacement.py
@Software: PyCharm
'''

from docx import Document
import os
import tqdm

def get_docx_list(dir_path):
    '''
    :param dir_path:
    :return: List of docx files in the current directory
    '''
    file_list = []
    for path,dir,files in os.walk(dir_path):
        for file in files:
            if file.endswith("docx") == True and str(file[0]) != "~":  #Locate the docx document and exclude temporary files
                file_root = path+"\\"+file
                file_list.append(file_root)
    print("The directory found a total of {0} related files!".format(len(file_list)))
    return file_list

class ParagraphsKeyWordsReplace:
    '''
        self:paragraph
    '''
    def paragraph_keywords_replace(self,x,key,value):
        '''
        :param x:  paragraph index
        :param key: Key words to be replaced
        :param value: Replace the key words
        :return:
        '''
        keywords_list = [s for s in range(len(self.text)) if self.text.find(key, s) == s] # Retrieve the number of occurrences of the Key in this paragraph and record the starting position in the List
        # there if use: while self.text.find(key) >= 0,When {"ab":" ABC "} is encountered, it will enter an infinite loop
        while len(keywords_list)>0:             #If this paragraph contains more than one key, you need to iterate
            index_list = [] #Gets the index value for all characters in this paragraph
            for y, run in enumerate(self.runs):  # Read the index of run
                for z, char in enumerate(list(run.text)):  # Read the index of the chars in the run
                    position = {"run": y, "char": z}  # Give each character a dictionary index
                    index_list.append(position)
            # print(index_list)
            start_i = keywords_list.pop()   # Fetch the starting position containing the key from the back to the front of the list
            end_i = start_i + len(key)      # Determine where the key word ends in the paragraph
            keywords_index_list = index_list[start_i:end_i]  # Intercept the section of a list that contains keywords in a paragraph
            # print(keywords_index_list)
            # return keywords_index_list    #Returns a list of coordinates for the chars associated with keywords
            ParagraphsKeyWordsReplace.character_replace(self, keywords_index_list, value)
            # print(f"Successful replacement:{key}===>{value}")

    def character_replace(self,keywords_index_list,value):
        '''
        :param keywords_index_list: A list of indexed dictionaries containing keywords
        :param value: The new word after the replacement
        : return:
        Receive parameters and delete the characters in keywords_index_list back-to-back, reserving the first character to replace with value
        Note: Do not delete the list in reverse order, otherwise the list length change will cause a string index out of range error
        '''
        while len(keywords_index_list) > 0:
            dict = keywords_index_list.pop()    #Deletes the last element and returns its value
            y = dict["run"]
            z = dict["char"]
            run = self.runs[y]
            char = self.runs[y].text[z]
            if len(keywords_index_list) > 0:
                run.text = run.text.replace(char, "")       #Delete the [1:] character
            elif len(keywords_index_list) == 0:
                run.text = run.text.replace(char, value)    #Replace the 0th character

class DocxKeyWordsReplace:
    '''
        self:docx
    '''
    def content(self,replace_dict):
        print("Please wait for a moment, the body content is processed...")
        for key, value in tqdm.tqdm(replace_dict.items()):
            for x,paragraph in enumerate(self.paragraphs):
                ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph,x,key,value)

    def tables(self,replace_dict):
        print("Please wait for a moment, the body tables is processed...")
        for key,value in tqdm.tqdm(replace_dict.items()):
            for i,table in enumerate(self.tables):
                for j,row in enumerate(table.rows):
                    for cell in row.cells:
                        for x,paragraph in enumerate(cell.paragraphs):
                            ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph,x,key,value)

    def header_content(self,replace_dict):
        print("Please wait for a moment, the header body content is processed...")
        for key,value in tqdm.tqdm(replace_dict.items()):
            for i,sections in enumerate(self.sections):
                for x,paragraph in enumerate(self.sections[i].header.paragraphs):
                    ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value)

    def header_tables(self,replace_dict):
        print("Please wait for a moment, the header body tables is processed...")
        for key,value in tqdm.tqdm(replace_dict.items()):
            for i,sections in enumerate(self.sections):
                for j,tables in enumerate(self.sections[i].header.tables):
                    for k,row in enumerate(tables[j].rows):
                        for l,cell in row.cells:
                            for x, paragraph in enumerate(cell.paragraphs):
                                ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value)

    def footer_content(self, replace_dict):
        print("Please wait for a moment, the footer body content is processed...")
        for key,value in tqdm.tqdm(replace_dict.items()):
            for i, sections in enumerate(self.sections):
                for x, paragraph in enumerate(self.sections[i].footer.paragraphs):
                    ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value)


    def footer_tables(self, replace_dict):
        print("Please wait for a moment, the footer body tables is processed...")
        for key,value in tqdm.tqdm(replace_dict.items()):
            for i, sections in enumerate(self.sections):
                for j, tables in enumerate(self.sections[i].footer.tables):
                    for k, row in enumerate(tables[j].rows):
                        for l, cell in row.cells:
                            for x, paragraph in enumerate(cell.paragraphs):
                                ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value)

def main():
    '''
    How to use it: Modify the values in replace_dict and file_dir
    Replace_dict: The following dictionary corresponds to the format, with key as the content to be replaced and value as the new content
    File_dir: The directory where the docx file resides. Supports subdirectories
    '''
    # Input part
    replace_dict = {
        "MG life technology (shenzhen) co., LTD":"Shenzhen YW medical technology co., LTD",
        "MG-":"YW-",
        "2017-":"2020-",
        "Z18":"Z20",

        }
    file_dir = r"D:\Working Files\SVN\"
    # Call processing part
    for i,file in enumerate(get_docx_list(file_dir),start=1):
        print(f"{i}、Files in progress:{file}")
        docx = Document(file)
        DocxKeyWordsReplace.content(docx, replace_dict=replace_dict)
        DocxKeyWordsReplace.tables(docx, replace_dict=replace_dict)
        DocxKeyWordsReplace.header_content(docx, replace_dict=replace_dict)
        DocxKeyWordsReplace.header_tables(docx, replace_dict=replace_dict)
        DocxKeyWordsReplace.footer_content(docx, replace_dict=replace_dict)
        DocxKeyWordsReplace.footer_tables(docx, replace_dict=replace_dict)
        docx.save(file)
        print("This document has been processed!\n")

if __name__ == "__main__":
    main()
    print("All complete processing!")