替换python docx中的引号和撇号_Python_Docx_Python Docx

替换python docx中的引号和撇号

python

替换python docx中的引号和撇号,python,docx,python-docx,Python,Docx,Python Docx,各位。我在使用Python3 Python docx模块替换docx文件中的引号和撇号时遇到问题。我过去常写剧本。但是，它不能识别标点符号，只能识别单词。例如，我有以下测试文本： ...(beneficiaries and their subordinate actors). "Manipulation" is imagi'ned as a non-violent, concealed, non-linear, and dynamic process. 我需要用卷曲的

各位。我在使用Python3 Python docx模块替换docx文件中的引号和撇号时遇到问题。我过去常写剧本。但是，它不能识别标点符号，只能识别单词。例如，我有以下测试文本：

...(beneficiaries and their subordinate actors). 
"Manipulation" is imagi'ned as a non-violent, concealed, non-linear, and dynamic process.

我需要用卷曲的引号替换直引号和撇号。我不太明白到底出了什么问题：代码是否无法识别unicode字符，或者docx运行是在标点字符处分开的，或者是其他不同的地方。我将如何检查相邻的运行以获取我的正则表达式信息

import os, docx, re
from docx import Document
for root, dirs, files in os.walk(os.getcwd()):
    for name in files:
        namelist = name.split(".")
        if namelist[-1]=='docx' or namelist[-1]=='doc':

            def docx_replace_regex(doc_obj, regex , replace):

                for p in doc_obj.paragraphs:
                    if regex.search(p.text):
                        inline = p.runs
                        # Loop added to work with runs (strings with same style)
                        for i in range(len(inline)):
                            if regex.search(inline[i].text):
                                text = regex.sub(replace, inline[i].text)
                                inline[i].text = text
                                
                for table in doc_obj.tables:
                    for row in table.rows:
                       for cell in row.cells:
                        docx_replace_regex(cell, regex , replace)

            regex1 = re.compile(r'(\u0022)(\w)')
            replace1 = (r'\u201C\2')
            regex2 = re.compile('([a-zA-Z0-9,.])(\")')
            replace2 = (r'\1\u201D')
            regex3 = re.compile(r"(\w)(\u0027)(\w)")
            replace3 = (r"\1\u2019\3")
            doc = Document(name)
            print(name)
            docx_replace_regex(doc, regex1, replace1)
            docx_replace_regex(doc, regex2, replace2)
            docx_replace_regex(doc, regex3, replace3) 
            doc.save('result1.docx')

在任意点以分词方式运行。如果为每次运行打印

run.text

，您将看到，在与单引号字符处于同一运行中时，无法指望要匹配的单词边界或空白。