替换python docx中的引号和撇号
各位。我在使用Python3 Python docx模块替换docx文件中的引号和撇号时遇到问题。 我过去常写剧本。但是,它不能识别标点符号,只能识别单词。例如,我有以下测试文本:替换python docx中的引号和撇号,python,docx,python-docx,Python,Docx,Python Docx,各位。我在使用Python3 Python docx模块替换docx文件中的引号和撇号时遇到问题。 我过去常写剧本。但是,它不能识别标点符号,只能识别单词。例如,我有以下测试文本: ...(beneficiaries and their subordinate actors). "Manipulation" is imagi'ned as a non-violent, concealed, non-linear, and dynamic process. 我需要用卷曲的
...(beneficiaries and their subordinate actors).
"Manipulation" is imagi'ned as a non-violent, concealed, non-linear, and dynamic process.
我需要用卷曲的引号替换直引号和撇号。我不太明白到底出了什么问题:代码是否无法识别unicode字符,或者docx运行是在标点字符处分开的,或者是其他不同的地方。我将如何检查相邻的运行以获取我的正则表达式信息
import os, docx, re
from docx import Document
for root, dirs, files in os.walk(os.getcwd()):
for name in files:
namelist = name.split(".")
if namelist[-1]=='docx' or namelist[-1]=='doc':
def docx_replace_regex(doc_obj, regex , replace):
for p in doc_obj.paragraphs:
if regex.search(p.text):
inline = p.runs
# Loop added to work with runs (strings with same style)
for i in range(len(inline)):
if regex.search(inline[i].text):
text = regex.sub(replace, inline[i].text)
inline[i].text = text
for table in doc_obj.tables:
for row in table.rows:
for cell in row.cells:
docx_replace_regex(cell, regex , replace)
regex1 = re.compile(r'(\u0022)(\w)')
replace1 = (r'\u201C\2')
regex2 = re.compile('([a-zA-Z0-9,.])(\")')
replace2 = (r'\1\u201D')
regex3 = re.compile(r"(\w)(\u0027)(\w)")
replace3 = (r"\1\u2019\3")
doc = Document(name)
print(name)
docx_replace_regex(doc, regex1, replace1)
docx_replace_regex(doc, regex2, replace2)
docx_replace_regex(doc, regex3, replace3)
doc.save('result1.docx')
在任意点以分词方式运行。如果为每次运行打印
run.text
,您将看到,在与单引号字符处于同一运行中时,无法指望要匹配的单词边界或空白。