如何使用PythonDocx替换Word文档中的文本并保存_Python_Text_Replace_Ms Word_Python Docx

如何使用PythonDocx替换Word文档中的文本并保存

python text replace ms-word

如何使用PythonDocx替换Word文档中的文本并保存,python,text,replace,ms-word,python-docx,Python,Text,Replace,Ms Word,Python Docx,同一页面中提到的oodocx模块将用户指向一个似乎不存在的/examples文件夹。我已经阅读了pythondocx0.7.2的文档，以及在Stackoverflow中可以找到的关于这个主题的所有内容，所以请相信我已经完成了我的“家庭作业” Python是我所知道的唯一语言（初学者+，可能是中级），所以请不要假设对C、Unix、xml等有任何了解任务：打开一个ms word 2007+文档，其中只包含一行文本（以保持简单），并将该行文本中出现的任何“关键字”替换为其字典值。然后关闭文档，保持

同一页面中提到的oodocx模块将用户指向一个似乎不存在的/examples文件夹。
我已经阅读了pythondocx0.7.2的文档，以及在Stackoverflow中可以找到的关于这个主题的所有内容，所以请相信我已经完成了我的“家庭作业”

Python是我所知道的唯一语言（初学者+，可能是中级），所以请不要假设对C、Unix、xml等有任何了解

任务：打开一个ms word 2007+文档，其中只包含一行文本（以保持简单），并将该行文本中出现的任何“关键字”替换为其字典值。然后关闭文档，保持其他内容不变

一行文字（例如）“我们将在大海的房间里徘徊。”

from docx import Document

document = Document('/Users/umityalcin/Desktop/Test.docx')

Dictionary = {‘sea’: “ocean”}

sections = document.sections
for section in sections:
    print(section.start_type)

#Now, I would like to navigate, focus on, get to, whatever to the section that has my
#single line of text and execute a find/replace using the dictionary above.
#then save the document in the usual way.

document.save('/Users/umityalcin/Desktop/Test.docx')

我在文档中没有看到任何允许我这样做的东西，也许它在那里，但我不明白，因为在我的水平上没有详细说明所有内容

我在这个站点上遵循了其他建议，并尝试使用模块（）的早期版本，该版本应该具有“replace、advReplace等方法”，如下所示：我在python解释器中打开源代码，并在最后添加以下内容（这是为了避免与已安装的版本0.7.2发生冲突）：

运行此命令会产生以下错误消息：

NameError:未定义名称“coreprops”

也许我正在尝试做一些不能做的事情，但是如果我错过了一些简单的事情，我会感谢你的帮助

如果这很重要，我正在OSX 10.9.3上使用64位版本的Enthough's Canopy。您第二次尝试的问题是您没有定义

savedocx

所需的参数。在保存之前，您需要执行以下操作：

relationships = docx.relationshiplist()
title = "Document Title"
subject = "Document Subject"
creator = "Document Creator"
keywords = []

coreprops = docx.coreproperties(title=title, subject=subject, creator=creator,
                       keywords=keywords)
app = docx.appproperties()
content = docx.contenttypes()
web = docx.websettings()
word = docx.wordrelationships(relationships)
output = r"path\to\where\you\want\to\save"

当前版本的python docx没有

search（）

函数或

replace（）

函数。这些请求相当频繁，但一般情况下的实现相当棘手，而且还没有上升到待办事项的顶部

不过，有几个人已经取得了成功，他们利用现有的设施完成了他们需要的工作。这里有一个例子。顺便说一句，这与章节无关：）

要同时在表中搜索，您需要使用以下内容：

for table in document.tables:
    for row in table.rows:
        for cell in row.cells:
            for paragraph in cell.paragraphs:
                if 'sea' in paragraph.text:
                    paragraph.text = paragraph.text.replace("sea", "ocean")

如果你沿着这条路走下去，你可能会很快发现其中的复杂性。如果替换段落的整个文本，将删除任何字符级格式，如粗体或斜体字或短语

顺便说一句，@wnnmaw的答案中的代码是针对python docx的旧版本的，对于0.3.0之后的版本根本不起作用。

我需要一些东西来替换docx中的正则表达式。我接受了斯坎尼的回答。为了处理风格，我使用了以下答案：添加了处理嵌套表的递归调用。然后想出了这样的办法：

import re
from docx import Document

def docx_replace_regex(doc_obj, regex , replace):

    for p in doc_obj.paragraphs:
        if regex.search(p.text):
            inline = p.runs
            # Loop added to work with runs (strings with same style)
            for i in range(len(inline)):
                if regex.search(inline[i].text):
                    text = regex.sub(replace, inline[i].text)
                    inline[i].text = text

    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace_regex(cell, regex , replace)



regex1 = re.compile(r"your regex")
replace1 = r"your replace string"
filename = "test.docx"
doc = Document(filename)
docx_replace_regex(doc, regex1 , replace1)
doc.save('result1.docx')

要遍历字典，请执行以下操作：

for word, replacement in dictionary.items():
    word_re=re.compile(word)
    docx_replace_regex(doc, word_re , replacement)

注意，只有当整个正则表达式在文档中具有相同的样式时，此解决方案才会替换正则表达式

此外，如果在保存相同样式后编辑文本，则文本可能会在单独的运行中。例如，如果打开具有“testabcd”字符串的文档，并将其更改为“test1abcd”并保存，即使是相同样式的文档，也会有3次单独的运行“test”、“1”和“abcd”，在这种情况下，替换test1将不起作用

这用于跟踪文档中的更改。要将其添加到一次运行，在Word中，您需要转到“选项”、“信任中心”和“隐私选项”中，取消“存储随机数以提高组合精度”并保存文档。

Office开发中心有一个开发人员发布的条目（此时已获得麻省理工学院许可）对一些算法的描述似乎为这一问题提供了解决方案（尽管是C#，并且需要移植）：“

他再次在docx py中更改了API

为了所有来到这里的人的理智：

import datetime
import os
from decimal import Decimal
from typing import NamedTuple

from docx import Document
from docx.document import Document as nDocument


class DocxInvoiceArg(NamedTuple):
  invoice_to: str
  date_from: str
  date_to: str
  project_name: str
  quantity: float
  hourly: int
  currency: str
  bank_details: str


class DocxService():
  tokens = [
    '@INVOICE_TO@',
    '@IDATE_FROM@',
    '@IDATE_TO@',
    '@INVOICE_NR@',
    '@PROJECTNAME@',
    '@QUANTITY@',
    '@HOURLY@',
    '@CURRENCY@',
    '@TOTAL@',
    '@BANK_DETAILS@',
  ]

  def __init__(self, replace_vals: DocxInvoiceArg):
    total = replace_vals.quantity * replace_vals.hourly
    invoice_nr = replace_vals.project_name + datetime.datetime.strptime(replace_vals.date_to, '%Y-%m-%d').strftime('%Y%m%d')
    self.replace_vals = [
      {'search': self.tokens[0], 'replace': replace_vals.invoice_to },
      {'search': self.tokens[1], 'replace': replace_vals.date_from },
      {'search': self.tokens[2], 'replace': replace_vals.date_to },
      {'search': self.tokens[3], 'replace': invoice_nr },
      {'search': self.tokens[4], 'replace': replace_vals.project_name },
      {'search': self.tokens[5], 'replace': replace_vals.quantity },
      {'search': self.tokens[6], 'replace': replace_vals.hourly },
      {'search': self.tokens[7], 'replace': replace_vals.currency },
      {'search': self.tokens[8], 'replace': total },
      {'search': self.tokens[9], 'replace': 'asdfasdfasdfdasf'},
    ]
    self.doc_path_template = os.path.dirname(os.path.realpath(__file__))+'/docs/'
    self.doc_path_output = self.doc_path_template + 'output/'
    self.document: nDocument = Document(self.doc_path_template + 'invoice_placeholder.docx')


  def save(self):
    for p in self.document.paragraphs:
      self._docx_replace_text(p)
    tables = self.document.tables
    self._loop_tables(tables)
    self.document.save(self.doc_path_output + 'testiboi3.docx')

  def _loop_tables(self, tables):
    for table in tables:
      for index, row in enumerate(table.rows):
        for cell in table.row_cells(index):
          if cell.tables:
            self._loop_tables(cell.tables)
          for p in cell.paragraphs:
            self._docx_replace_text(p)

        # for cells in column.
        # for cell in table.columns:

  def _docx_replace_text(self, p):
    print(p.text)
    for el in self.replace_vals:
      if (el['search'] in p.text):
        inline = p.runs
        # Loop added to work with runs (strings with same style)
        for i in range(len(inline)):
          print(inline[i].text)
          if el['search'] in inline[i].text:
            text = inline[i].text.replace(el['search'], str(el['replace']))
            inline[i].text = text
        print(p.text)

测试用例：

from django.test import SimpleTestCase
from docx.table import Table, _Rows

from toggleapi.services.DocxService import DocxService, DocxInvoiceArg


class TestDocxService(SimpleTestCase):

  def test_document_read(self):
    ds = DocxService(DocxInvoiceArg(invoice_to="""
    WAW test1
    Multi myfriend
    """,date_from="2019-08-01", date_to="2019-08-30", project_name='WAW', quantity=10.5, hourly=40, currency='USD',bank_details="""
    Paypal to:
    bippo@bippsi.com"""))

    ds.save()

有文件夹

docs

和

docs/output/

在您拥有

DocxService.py

e、 g

确保参数化并替换表格案例中的内容，我必须修改@scanny的答案：

for table in doc.tables:
    for col in table.columns:
        for cell in col.cells:
            for p in cell.paragraphs:

确实，在API的当前状态下，这似乎不起作用：

for table in document.tables:
    for cell in table.cells:

这里的代码也有同样的问题：

我从前面的答案中得到了很多帮助，但对我来说，下面的代码的功能与word中简单的查找和替换功能相同。希望这有帮助

#!pip install python-docx
#start from here if python-docx is installed
from docx import Document
#open the document
doc=Document('./test.docx')
Dictionary = {"sea": "ocean", "find_this_text":"new_text"}
for i in Dictionary:
    for p in doc.paragraphs:
        if p.text.find(i)>=0:
            p.text=p.text.replace(i,Dictionary[i])
#save changed document
doc.save('./test.docx')

上述解决方案具有局限性。1）包含“查找此文本”的段落将变为没有任何格式的纯文本，2）与“查找此文本”位于同一段落中的上下文控件将被删除，3）查找此文本“在任何上下文控件或表中都不会更改。

共享我编写的一个小脚本-帮助我生成带有变量的合法

.docx

契约，同时保留原始样式

pip install python-docx

示例：

来自docx导入文档
导入操作系统
def main（）：
模板\文件\路径='就业\协议\模板.docx'
输出文件路径='result.docx'
变量={
“${EMPLOEE_NAME}”：“示例名称”，
“${EMPLOEE_TITLE}”：“软件工程师”，
“${EMPLOEE_ID}”：“302929393”，
“${employee_ADDRESS}”：“员工地址”：“员工地址”，
“${EMPLOEE_PHONE}”：“+972-50560000”，
“${EMPLOEE_EMAIL}”：”example@example.com",
“${START_DATE}”：“2021年1月3日”，
“${SALARY}”：“10000”，
“${SALARY_30}”：“3000”，
“${SALARY_70}”：“7000”，
}
模板\文档=文档（模板\文件\路径）
对于variable_key，variable.items（）中的variable_值：
对于模板文件中的段落。段落：
替换\u段落中的\u文本\u（段落、变量\u键、变量\u值）
对于模板_document.tables中的表：
对于表中的列。
#!pip install python-docx
#start from here if python-docx is installed
from docx import Document
#open the document
doc=Document('./test.docx')
Dictionary = {"sea": "ocean", "find_this_text":"new_text"}
for i in Dictionary:
    for p in doc.paragraphs:
        if p.text.find(i)>=0:
            p.text=p.text.replace(i,Dictionary[i])
#save changed document
doc.save('./test.docx')

pip install python-docx