Python多个子正则表达式_Python_Python 3.x_Regex

Python多个子正则表达式

python python-3.x regex

Python多个子正则表达式,python,python-3.x,regex,Python,Python 3.x,Regex,最初使用这样的工作脚本检查文件夹中的csv文件并替换子字符串： import fileinput import os import glob #### Directory and file mask this = r"C:\work\PythonScripts\Replacer\*.csv" output_folder = "C:\\work\\PythonScripts\\Replacer\\" #### Get files files = glob

最初使用这样的工作脚本检查文件夹中的csv文件并替换子字符串：

import fileinput
import os
import glob

#### Directory and file mask
this = r"C:\work\PythonScripts\Replacer\*.csv"
output_folder = "C:\\work\\PythonScripts\\Replacer\\"

#### Get files
files = glob.glob(this)

#### Section to replace
text_to_search = 'z'
replacement_text = 'ZZ_Top'

#### Loop through files and lines:
for f in files:
    head, tail = os.path.split(f)
    targetFileName = os.path.join(head, output_folder, tail)

    with fileinput.FileInput(targetFileName, inplace=True, backup='.bak') as file:
        for line in file:
            print(line.replace(text_to_search, replacement_text), end='')

有必要用几个引号和长连字符来代替。所以我想在上面的循环中使用类似的东西：

s = '’ ‘ ’ ‘ ’ – “ ” “ – ’'
print(s)
print(s.replace('’', '\'').replace('‘', '\'').replace('–','-').replace('“','"').replace('”','"'))

==>

但后来我遇到了以下关于使用regex子函数的评论：

所以我试过了，它自己也很好：

import re

def multisub(subs, subject):
 #   "Simultaneously perform all substitutions on the subject string."
    pattern = '|'.join('(%s)' % re.escape(p) for p, s in subs)
    substs = [s for p, s in subs]
    replace = lambda m: substs[m.lastindex - 1]
    return re.sub(pattern, replace, subject)

print(multisub([('’', '\''), ('‘', '\''), ('–','-'), ('“','"'), ('”','"')], '1’ 2‘ 1’ 2‘ 1’ 3– 4“ 5” 4“ 3– 2’'))

==>

但只要我将其粘贴到原始脚本，它就会运行，但不会修改文件：

import fileinput
import os
import glob
import re

#### Directory and file mask
this = r"C:\work\PythonScripts\Replacer\*.csv"
output_folder = "C:\\work\\PythonScripts\\Replacer\\"

#### RegEx substitution func
def multisub(subs, subject):
 #   "Simultaneously perform all substitutions on the subject string."
    pattern = '|'.join('(%s)' % re.escape(p) for p, s in subs)
    substs = [s for p, s in subs]
    replace = lambda m: substs[m.lastindex - 1]
    return re.sub(pattern, replace, subject)

#### Get files
files = glob.glob(this)

#### Loop through files and lines:
for f in files:
    head, tail = os.path.split(f)
    targetFileName = os.path.join(head, output_folder, tail)

    with fileinput.FileInput(targetFileName, inplace=True, backup='.bak') as file:
        for line in file:
            print(multisub([('’', '\''), ('‘', '\''), ('–','-'), ('“','"'), ('”','"')], line), end='')

这里可能有什么问题？

您的代码实际上在我测试它时对我有效，但您有很多不必要的处理，可能会引入错误。与常规的

open

相比，使用

fileinput

的最大优势在于，它可以循环多个文件中的行，而不需要另一个循环来单独打开每个文件。因此，试试这个，看看它是否有效：

#### Get files
files = glob.glob(this)

#### Loop through files and lines:
for line in fileinput.input(files, inplace=True, backup='.bak'):
    print(multisub([('’', '\''), ('‘', '\''), ('–','-'), ('“','"'), ('”','"')], line), end='')

似乎代码本身正在工作。缺少的是它是在Windows上运行的，所以我不得不将值为1的PYTHONUTF8系统变量添加到环境变量中。在此之后，原始代码工作正常。

出于某种原因，在文件夹中有一个包含字符的csv文件，我最终使用此代码得到一个空文件。。。

import fileinput
import os
import glob
import re

#### Directory and file mask
this = r"C:\work\PythonScripts\Replacer\*.csv"
output_folder = "C:\\work\\PythonScripts\\Replacer\\"

#### RegEx substitution func
def multisub(subs, subject):
 #   "Simultaneously perform all substitutions on the subject string."
    pattern = '|'.join('(%s)' % re.escape(p) for p, s in subs)
    substs = [s for p, s in subs]
    replace = lambda m: substs[m.lastindex - 1]
    return re.sub(pattern, replace, subject)

#### Get files
files = glob.glob(this)

#### Loop through files and lines:
for f in files:
    head, tail = os.path.split(f)
    targetFileName = os.path.join(head, output_folder, tail)

    with fileinput.FileInput(targetFileName, inplace=True, backup='.bak') as file:
        for line in file:
            print(multisub([('’', '\''), ('‘', '\''), ('–','-'), ('“','"'), ('”','"')], line), end='')

#### Get files
files = glob.glob(this)

#### Loop through files and lines:
for line in fileinput.input(files, inplace=True, backup='.bak'):
    print(multisub([('’', '\''), ('‘', '\''), ('–','-'), ('“','"'), ('”','"')], line), end='')