Python docx在保持样式的同时替换段落中的字符串_Python_Python 2.7_Python Docx_Python3_Python Docx Template

Python docx在保持样式的同时替换段落中的字符串

python python-2.7

Python docx在保持样式的同时替换段落中的字符串,python,python-2.7,python-docx,python3,python-docx-template,Python,Python 2.7,Python Docx,Python3,Python Docx Template,我需要帮助替换word文档中的字符串，同时保留整个文档的格式我使用的是PythonDocx，在阅读了文档之后，它可以处理整个段落，所以我放松了格式，比如粗体或斜体字。包括要替换的文本是粗体的，我希望保持这种方式。我正在使用以下代码： from docx import Document def replace_string2(filename): doc = Document(filename) for p in doc.paragraphs: if 'Tex

我需要帮助替换word文档中的字符串，同时保留整个文档的格式

我使用的是PythonDocx，在阅读了文档之后，它可以处理整个段落，所以我放松了格式，比如粗体或斜体字。包括要替换的文本是粗体的，我希望保持这种方式。我正在使用以下代码：

from docx import Document
def replace_string2(filename):
    doc = Document(filename)
    for p in doc.paragraphs:
        if 'Text to find and replace' in p.text:
            print 'SEARCH FOUND!!'
            text = p.text.replace('Text to find and replace', 'new text')
            style = p.style
            p.text = text
            p.style = style
    # doc.save(filename)
    doc.save('test.docx')
    return 1

因此，如果我实现它并希望类似（包含要替换的字符串的段落将丢失其格式）：

这是第1段，这是粗体的文本
这是第2段，我将替换旧文本
目前的结果是：
这是第1段，这是粗体的文本
这是第2段，我将替换新的文本
我发布了这个问题（尽管我在这里看到了一些相同的问题），因为（据我所知）这些问题都没有解决这个问题。有一个使用oodocx库，我尝试过，但没有成功。所以我找到了一个解决办法
代码非常相似，但逻辑是：当我找到包含我希望替换的字符串的段落时，使用runs添加另一个循环。（仅当我希望替换的字符串具有相同的格式时，此操作才有效）

这就是我在替换文本时保留文本样式的方法
基于
Alo
的答案以及搜索文本可以分多次运行的事实，下面是我用来替换模板docx文件中占位符文本的方法。它检查所有文档段落和任何表格单元格内容中的占位符
在段落中找到搜索文本后，它会循环遍历其运行，确定哪些运行包含搜索文本的部分文本，然后在第一次运行中插入替换文本，然后在其余运行中清空剩余的搜索文本字符
我希望这对某人有帮助。如果有人想改进它，这里有个建议
编辑：我随后发现了
python docx模板
，它允许在docx模板中使用jinja2样式的模板。这里有一个链接到

def docx_替换（文档、数据）：段落=列表（文件段落）对于doc.tables中的t：对于t.行中的行：对于row.cells中的单元格：对于单元格中的段落。段落：段落.附加（段落）对于段落中的p：对于键，data.items（）中的val： key_name='${{}'.format（key）#我正在使用${placeholder name}形式的占位符如果p.text中的键名称：内联=p.runs #替换字符串并保留相同的样式。 #要替换的文本可以拆分为多段，以便 #搜索，确定哪些运行需要替换文本 #然后替换已识别的文本中的文本开始=错误关键字索引=0 #found_runs是一个列表（内联索引、匹配索引、匹配长度）找到\u runs=list（）发现所有=错误 replace_done=False 对于范围内的i（len（inline））： #案例1：在单次运行中发现短路，请更换如果内联[i].text中的键_名称未启动： find_runs.append（（i，内联[i].text.find（键名称），len（键名称））） text=inline[i].text.replace（键名，str（val））内联[i]。文本=文本替换_done=True 找到所有=真打破如果键名[key\u index]不在内联[i]中，则为文本且未启动： #继续找。。。持续 #案例2：搜索部分文本，查找第一次运行如果内联[i].text中的key\u name[key\u index]和内联[i].text[-1]中的key\u name未启动： #检查顺序 start\u index=inline[i].text.find（key\u name[key\u index]）检查_length=len（内联[i].text）对于范围内的文本索引（开始索引，检查长度）：如果内联[i]。文本[text_index]！=关键字名称[关键字索引]： #没有匹配项，因此必须为假阳性打破如果key_index==0：开始=真 chars\u found=检查长度-开始索引键索引+=找到字符查找\u runs.append（（i，开始\u索引，查找字符））如果关键字索引！=len（钥匙名称）：持续其他： #在key_name中找到所有字符找到所有=真打破 #案例2：搜索部分文本，查找后续运行如果内联[i]中的键名称[key\u index]，则文本和已启动但未找到所有键： #检查顺序找到的字符数=0 检查_length=len（内联[i].text）对于范围（0，检查长度）内的文本索引：如果内联[i].text[text\u index]==键名称[key\u index]：键索引+=1 找到的字符数+=1 其他：打破 #没有比赛，所以必须结束已找到\u运行。追加（（i，0，已找到字符））如果键索引==len（键名称）：找到所有=真 def replace_string(filename): doc = Document(filename) for p in doc.paragraphs: if 'old text' in p.text: inline = p.runs # Loop added to work with runs (strings with same style) for i in range(len(inline)): if 'old text' in inline[i].text: text = inline[i].text.replace('old text', 'new text') inline[i].text = text print p.text doc.save('dest1.docx') return 1 from docx import Document document = Document('old.docx') dic = {'name':'ahmed','me':'zain'} for p in document.paragraphs: inline = p.runs for i in range(len(inline)): text = inline[i].text if text in dic.keys(): text=text.replace(text,dic[text]) inline[i].text = text document.save('new.docx') ''' -*- coding: utf-8 -*- @Time : 2021/4/19 13:13 @Author : ZCG @Site : @File : Batch DOCX document keyword replacement.py @Software: PyCharm ''' from docx import Document import os import tqdm def get_docx_list(dir_path): ''' :param dir_path: :return: List of docx files in the current directory ''' file_list = [] for path,dir,files in os.walk(dir_path): for file in files: if file.endswith("docx") == True and str(file[0]) != "~": #Locate the docx document and exclude temporary files file_root = path+"\\"+file file_list.append(file_root) print("The directory found a total of {0} related files!".format(len(file_list))) return file_list class ParagraphsKeyWordsReplace: ''' self:paragraph ''' def paragraph_keywords_replace(self,x,key,value): ''' :param x: paragraph index :param key: Key words to be replaced :param value: Replace the key words :return: ''' keywords_list = [s for s in range(len(self.text)) if self.text.find(key, s) == s] # Retrieve the number of occurrences of the Key in this paragraph and record the starting position in the List # there if use: while self.text.find(key) >= 0，When {"ab":" ABC "} is encountered, it will enter an infinite loop while len(keywords_list)>0: #If this paragraph contains more than one key, you need to iterate index_list = [] #Gets the index value for all characters in this paragraph for y, run in enumerate(self.runs): # Read the index of run for z, char in enumerate(list(run.text)): # Read the index of the chars in the run position = {"run": y, "char": z} # Give each character a dictionary index index_list.append(position) # print(index_list) start_i = keywords_list.pop() # Fetch the starting position containing the key from the back to the front of the list end_i = start_i + len(key) # Determine where the key word ends in the paragraph keywords_index_list = index_list[start_i:end_i] # Intercept the section of a list that contains keywords in a paragraph # print(keywords_index_list) # return keywords_index_list #Returns a list of coordinates for the chars associated with keywords ParagraphsKeyWordsReplace.character_replace(self, keywords_index_list, value) # print(f"Successful replacement:{key}===>{value}") def character_replace(self,keywords_index_list,value): ''' :param keywords_index_list: A list of indexed dictionaries containing keywords :param value: The new word after the replacement : return: Receive parameters and delete the characters in keywords_index_list back-to-back, reserving the first character to replace with value Note: Do not delete the list in reverse order, otherwise the list length change will cause a string index out of range error ''' while len(keywords_index_list) > 0: dict = keywords_index_list.pop() #Deletes the last element and returns its value y = dict["run"] z = dict["char"] run = self.runs[y] char = self.runs[y].text[z] if len(keywords_index_list) > 0: run.text = run.text.replace(char, "") #Delete the [1:] character elif len(keywords_index_list) == 0: run.text = run.text.replace(char, value) #Replace the 0th character class DocxKeyWordsReplace: ''' self:docx ''' def content(self,replace_dict): print("Please wait for a moment, the body content is processed...") for key, value in tqdm.tqdm(replace_dict.items()): for x,paragraph in enumerate(self.paragraphs): ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph,x,key,value) def tables(self,replace_dict): print("Please wait for a moment, the body tables is processed...") for key,value in tqdm.tqdm(replace_dict.items()): for i,table in enumerate(self.tables): for j,row in enumerate(table.rows): for cell in row.cells: for x,paragraph in enumerate(cell.paragraphs): ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph,x,key,value) def header_content(self,replace_dict): print("Please wait for a moment, the header body content is processed...") for key,value in tqdm.tqdm(replace_dict.items()): for i,sections in enumerate(self.sections): for x,paragraph in enumerate(self.sections[i].header.paragraphs): ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value) def header_tables(self,replace_dict): print("Please wait for a moment, the header body tables is processed...") for key,value in tqdm.tqdm(replace_dict.items()): for i,sections in enumerate(self.sections): for j,tables in enumerate(self.sections[i].header.tables): for k,row in enumerate(tables[j].rows): for l,cell in row.cells: for x, paragraph in enumerate(cell.paragraphs): ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value) def footer_content(self, replace_dict): print("Please wait for a moment, the footer body content is processed...") for key,value in tqdm.tqdm(replace_dict.items()): for i, sections in enumerate(self.sections): for x, paragraph in enumerate(self.sections[i].footer.paragraphs): ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value) def footer_tables(self, replace_dict): print("Please wait for a moment, the footer body tables is processed...") for key,value in tqdm.tqdm(replace_dict.items()): for i, sections in enumerate(self.sections): for j, tables in enumerate(self.sections[i].footer.tables): for k, row in enumerate(tables[j].rows): for l, cell in row.cells: for x, paragraph in enumerate(cell.paragraphs): ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value) def main(): ''' How to use it: Modify the values in replace_dict and file_dir Replace_dict: The following dictionary corresponds to the format, with key as the content to be replaced and value as the new content File_dir: The directory where the docx file resides. Supports subdirectories ''' # Input part replace_dict = { "MG life technology (shenzhen) co., LTD":"Shenzhen YW medical technology co., LTD", "MG-":"YW-", "2017-":"2020-", "Z18":"Z20", } file_dir = r"D:\Working Files\SVN\" # Call processing part for i,file in enumerate(get_docx_list(file_dir),start=1): print(f"{i}、Files in progress:{file}") docx = Document(file) DocxKeyWordsReplace.content(docx, replace_dict=replace_dict) DocxKeyWordsReplace.tables(docx, replace_dict=replace_dict) DocxKeyWordsReplace.header_content(docx, replace_dict=replace_dict) DocxKeyWordsReplace.header_tables(docx, replace_dict=replace_dict) DocxKeyWordsReplace.footer_content(docx, replace_dict=replace_dict) DocxKeyWordsReplace.footer_tables(docx, replace_dict=replace_dict) docx.save(file) print("This document has been processed！\n") if __name__ == "__main__": main() print("All complete processing！")