Python docx在保持样式的同时替换段落中的字符串
我需要帮助替换word文档中的字符串,同时保留整个文档的格式 我使用的是PythonDocx,在阅读了文档之后,它可以处理整个段落,所以我放松了格式,比如粗体或斜体字。 包括要替换的文本是粗体的,我希望保持这种方式。 我正在使用以下代码:Python docx在保持样式的同时替换段落中的字符串,python,python-2.7,python-docx,python3,python-docx-template,Python,Python 2.7,Python Docx,Python3,Python Docx Template,我需要帮助替换word文档中的字符串,同时保留整个文档的格式 我使用的是PythonDocx,在阅读了文档之后,它可以处理整个段落,所以我放松了格式,比如粗体或斜体字。 包括要替换的文本是粗体的,我希望保持这种方式。 我正在使用以下代码: from docx import Document def replace_string2(filename): doc = Document(filename) for p in doc.paragraphs: if 'Tex
from docx import Document
def replace_string2(filename):
doc = Document(filename)
for p in doc.paragraphs:
if 'Text to find and replace' in p.text:
print 'SEARCH FOUND!!'
text = p.text.replace('Text to find and replace', 'new text')
style = p.style
p.text = text
p.style = style
# doc.save(filename)
doc.save('test.docx')
return 1
因此,如果我实现它并希望类似(包含要替换的字符串的段落将丢失其格式):
这是第1段,这是粗体的文本
这是第2段,我将替换旧文本
目前的结果是:
这是第1段,这是粗体的文本
这是第2段,我将替换新的文本我发布了这个问题(尽管我在这里看到了一些相同的问题),因为(据我所知)这些问题都没有解决这个问题。有一个使用oodocx库,我尝试过,但没有成功。所以我找到了一个解决办法
代码非常相似,但逻辑是:当我找到包含我希望替换的字符串的段落时,使用runs添加另一个循环。
(仅当我希望替换的字符串具有相同的格式时,此操作才有效)
这就是我在替换文本时保留文本样式的方法 基于
Alo
的答案以及搜索文本可以分多次运行的事实,下面是我用来替换模板docx文件中占位符文本的方法。它检查所有文档段落和任何表格单元格内容中的占位符
在段落中找到搜索文本后,它会循环遍历其运行,确定哪些运行包含搜索文本的部分文本,然后在第一次运行中插入替换文本,然后在其余运行中清空剩余的搜索文本字符
我希望这对某人有帮助。如果有人想改进它,这里有个建议
编辑:
我随后发现了python docx模板
,它允许在docx模板中使用jinja2样式的模板。这里有一个链接到
def docx_替换(文档、数据):
段落=列表(文件段落)
对于doc.tables中的t:
对于t.行中的行:
对于row.cells中的单元格:
对于单元格中的段落。段落:
段落.附加(段落)
对于段落中的p:
对于键,data.items()中的val:
key_name='${{}'.format(key)#我正在使用${placeholder name}形式的占位符
如果p.text中的键名称:
内联=p.runs
#替换字符串并保留相同的样式。
#要替换的文本可以拆分为多段,以便
#搜索,确定哪些运行需要替换文本
#然后替换已识别的文本中的文本
开始=错误
关键字索引=0
#found_runs是一个列表(内联索引、匹配索引、匹配长度)
找到\u runs=list()
发现所有=错误
replace_done=False
对于范围内的i(len(inline)):
#案例1:在单次运行中发现短路,请更换
如果内联[i].text中的键_名称未启动:
find_runs.append((i,内联[i].text.find(键名称),len(键名称)))
text=inline[i].text.replace(键名,str(val))
内联[i]。文本=文本
替换_done=True
找到所有=真
打破
如果键名[key\u index]不在内联[i]中,则为文本且未启动:
#继续找。。。
持续
#案例2:搜索部分文本,查找第一次运行
如果内联[i].text中的key\u name[key\u index]和内联[i].text[-1]中的key\u name未启动:
#检查顺序
start\u index=inline[i].text.find(key\u name[key\u index])
检查_length=len(内联[i].text)
对于范围内的文本索引(开始索引,检查长度):
如果内联[i]。文本[text_index]!=关键字名称[关键字索引]:
#没有匹配项,因此必须为假阳性
打破
如果key_index==0:
开始=真
chars\u found=检查长度-开始索引
键索引+=找到字符
查找\u runs.append((i,开始\u索引,查找字符))
如果关键字索引!=len(钥匙名称):
持续
其他:
#在key_name中找到所有字符
找到所有=真
打破
#案例2:搜索部分文本,查找后续运行
如果内联[i]中的键名称[key\u index],则文本和已启动但未找到所有键:
#检查顺序
找到的字符数=0
检查_length=len(内联[i].text)
对于范围(0,检查长度)内的文本索引:
如果内联[i].text[text\u index]==键名称[key\u index]:
键索引+=1
找到的字符数+=1
其他:
打破
#没有比赛,所以必须结束
已找到\u运行。追加((i,0,已找到字符))
如果键索引==len(键名称):
找到所有=真
def replace_string(filename):
doc = Document(filename)
for p in doc.paragraphs:
if 'old text' in p.text:
inline = p.runs
# Loop added to work with runs (strings with same style)
for i in range(len(inline)):
if 'old text' in inline[i].text:
text = inline[i].text.replace('old text', 'new text')
inline[i].text = text
print p.text
doc.save('dest1.docx')
return 1
from docx import Document
document = Document('old.docx')
dic = {'name':'ahmed','me':'zain'}
for p in document.paragraphs:
inline = p.runs
for i in range(len(inline)):
text = inline[i].text
if text in dic.keys():
text=text.replace(text,dic[text])
inline[i].text = text
document.save('new.docx')
'''
-*- coding: utf-8 -*-
@Time : 2021/4/19 13:13
@Author : ZCG
@Site :
@File : Batch DOCX document keyword replacement.py
@Software: PyCharm
'''
from docx import Document
import os
import tqdm
def get_docx_list(dir_path):
'''
:param dir_path:
:return: List of docx files in the current directory
'''
file_list = []
for path,dir,files in os.walk(dir_path):
for file in files:
if file.endswith("docx") == True and str(file[0]) != "~": #Locate the docx document and exclude temporary files
file_root = path+"\\"+file
file_list.append(file_root)
print("The directory found a total of {0} related files!".format(len(file_list)))
return file_list
class ParagraphsKeyWordsReplace:
'''
self:paragraph
'''
def paragraph_keywords_replace(self,x,key,value):
'''
:param x: paragraph index
:param key: Key words to be replaced
:param value: Replace the key words
:return:
'''
keywords_list = [s for s in range(len(self.text)) if self.text.find(key, s) == s] # Retrieve the number of occurrences of the Key in this paragraph and record the starting position in the List
# there if use: while self.text.find(key) >= 0,When {"ab":" ABC "} is encountered, it will enter an infinite loop
while len(keywords_list)>0: #If this paragraph contains more than one key, you need to iterate
index_list = [] #Gets the index value for all characters in this paragraph
for y, run in enumerate(self.runs): # Read the index of run
for z, char in enumerate(list(run.text)): # Read the index of the chars in the run
position = {"run": y, "char": z} # Give each character a dictionary index
index_list.append(position)
# print(index_list)
start_i = keywords_list.pop() # Fetch the starting position containing the key from the back to the front of the list
end_i = start_i + len(key) # Determine where the key word ends in the paragraph
keywords_index_list = index_list[start_i:end_i] # Intercept the section of a list that contains keywords in a paragraph
# print(keywords_index_list)
# return keywords_index_list #Returns a list of coordinates for the chars associated with keywords
ParagraphsKeyWordsReplace.character_replace(self, keywords_index_list, value)
# print(f"Successful replacement:{key}===>{value}")
def character_replace(self,keywords_index_list,value):
'''
:param keywords_index_list: A list of indexed dictionaries containing keywords
:param value: The new word after the replacement
: return:
Receive parameters and delete the characters in keywords_index_list back-to-back, reserving the first character to replace with value
Note: Do not delete the list in reverse order, otherwise the list length change will cause a string index out of range error
'''
while len(keywords_index_list) > 0:
dict = keywords_index_list.pop() #Deletes the last element and returns its value
y = dict["run"]
z = dict["char"]
run = self.runs[y]
char = self.runs[y].text[z]
if len(keywords_index_list) > 0:
run.text = run.text.replace(char, "") #Delete the [1:] character
elif len(keywords_index_list) == 0:
run.text = run.text.replace(char, value) #Replace the 0th character
class DocxKeyWordsReplace:
'''
self:docx
'''
def content(self,replace_dict):
print("Please wait for a moment, the body content is processed...")
for key, value in tqdm.tqdm(replace_dict.items()):
for x,paragraph in enumerate(self.paragraphs):
ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph,x,key,value)
def tables(self,replace_dict):
print("Please wait for a moment, the body tables is processed...")
for key,value in tqdm.tqdm(replace_dict.items()):
for i,table in enumerate(self.tables):
for j,row in enumerate(table.rows):
for cell in row.cells:
for x,paragraph in enumerate(cell.paragraphs):
ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph,x,key,value)
def header_content(self,replace_dict):
print("Please wait for a moment, the header body content is processed...")
for key,value in tqdm.tqdm(replace_dict.items()):
for i,sections in enumerate(self.sections):
for x,paragraph in enumerate(self.sections[i].header.paragraphs):
ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value)
def header_tables(self,replace_dict):
print("Please wait for a moment, the header body tables is processed...")
for key,value in tqdm.tqdm(replace_dict.items()):
for i,sections in enumerate(self.sections):
for j,tables in enumerate(self.sections[i].header.tables):
for k,row in enumerate(tables[j].rows):
for l,cell in row.cells:
for x, paragraph in enumerate(cell.paragraphs):
ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value)
def footer_content(self, replace_dict):
print("Please wait for a moment, the footer body content is processed...")
for key,value in tqdm.tqdm(replace_dict.items()):
for i, sections in enumerate(self.sections):
for x, paragraph in enumerate(self.sections[i].footer.paragraphs):
ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value)
def footer_tables(self, replace_dict):
print("Please wait for a moment, the footer body tables is processed...")
for key,value in tqdm.tqdm(replace_dict.items()):
for i, sections in enumerate(self.sections):
for j, tables in enumerate(self.sections[i].footer.tables):
for k, row in enumerate(tables[j].rows):
for l, cell in row.cells:
for x, paragraph in enumerate(cell.paragraphs):
ParagraphsKeyWordsReplace.paragraph_keywords_replace(paragraph, x, key, value)
def main():
'''
How to use it: Modify the values in replace_dict and file_dir
Replace_dict: The following dictionary corresponds to the format, with key as the content to be replaced and value as the new content
File_dir: The directory where the docx file resides. Supports subdirectories
'''
# Input part
replace_dict = {
"MG life technology (shenzhen) co., LTD":"Shenzhen YW medical technology co., LTD",
"MG-":"YW-",
"2017-":"2020-",
"Z18":"Z20",
}
file_dir = r"D:\Working Files\SVN\"
# Call processing part
for i,file in enumerate(get_docx_list(file_dir),start=1):
print(f"{i}、Files in progress:{file}")
docx = Document(file)
DocxKeyWordsReplace.content(docx, replace_dict=replace_dict)
DocxKeyWordsReplace.tables(docx, replace_dict=replace_dict)
DocxKeyWordsReplace.header_content(docx, replace_dict=replace_dict)
DocxKeyWordsReplace.header_tables(docx, replace_dict=replace_dict)
DocxKeyWordsReplace.footer_content(docx, replace_dict=replace_dict)
DocxKeyWordsReplace.footer_tables(docx, replace_dict=replace_dict)
docx.save(file)
print("This document has been processed!\n")
if __name__ == "__main__":
main()
print("All complete processing!")