Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ionic-framework/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:一行中有许多正则表达式?_Python_Regex_Text - Fatal编程技术网

Python:一行中有许多正则表达式?

Python:一行中有许多正则表达式?,python,regex,text,Python,Regex,Text,我有一些文本文件是另一个软件的输出。我有一个Perl脚本,由管道胶带固定在一起,它用一行近100个正则表达式来清理这些脚本 我是Python新手,我想知道是否有一种更惯用的方法来处理这个问题,而不是一大块,它比perl构造的:string=~s/blah/blah/I string = re.sub(r' +', " ", string, re.I) string = re.sub(r'(\w)- ', "\1, ", string, re.I) string = re.sub(r'u-s',

我有一些文本文件是另一个软件的输出。我有一个Perl脚本,由管道胶带固定在一起,它用一行近100个正则表达式来清理这些脚本

我是Python新手,我想知道是否有一种更惯用的方法来处理这个问题,而不是一大块,它比perl构造的:
string=~s/blah/blah/I

string = re.sub(r'  +', " ", string, re.I)
string = re.sub(r'(\w)- ', "\1, ", string, re.I)
string = re.sub(r'u-s', "U.S.", string, re.I)

例如,某种带有正则表达式的dict及其替换?我还想知道在一行中多次调用模块的函数会如何影响性能?

如果将正则表达式放入元组中,那么迭代元组并执行替换是一件容易的事情

正则表达式:

import re
regexs = (
    (r'  +', " ", re.I),
    (r'(\w)- ', "\1, ", re.I),
    (r'u-s', "U.S.", re.I),
)
compiled_regexs = [(re.compile(rx[0], rx[2]), rx[1]) for rx in regexs]
for line in lines:
    for regex, replace in compiled_regexs:
        line = regex.sub(replace, line)
    print(line)
lines = (
    'Quick  Brown  Fox',
    'u-s lazy  dog',
)
Quick Brown Fox
U.S. lazy dog 
代码:

import re
regexs = (
    (r'  +', " ", re.I),
    (r'(\w)- ', "\1, ", re.I),
    (r'u-s', "U.S.", re.I),
)
compiled_regexs = [(re.compile(rx[0], rx[2]), rx[1]) for rx in regexs]
for line in lines:
    for regex, replace in compiled_regexs:
        line = regex.sub(replace, line)
    print(line)
lines = (
    'Quick  Brown  Fox',
    'u-s lazy  dog',
)
Quick Brown Fox
U.S. lazy dog 
测试数据:

import re
regexs = (
    (r'  +', " ", re.I),
    (r'(\w)- ', "\1, ", re.I),
    (r'u-s', "U.S.", re.I),
)
compiled_regexs = [(re.compile(rx[0], rx[2]), rx[1]) for rx in regexs]
for line in lines:
    for regex, replace in compiled_regexs:
        line = regex.sub(replace, line)
    print(line)
lines = (
    'Quick  Brown  Fox',
    'u-s lazy  dog',
)
Quick Brown Fox
U.S. lazy dog 
结果:

import re
regexs = (
    (r'  +', " ", re.I),
    (r'(\w)- ', "\1, ", re.I),
    (r'u-s', "U.S.", re.I),
)
compiled_regexs = [(re.compile(rx[0], rx[2]), rx[1]) for rx in regexs]
for line in lines:
    for regex, replace in compiled_regexs:
        line = regex.sub(replace, line)
    print(line)
lines = (
    'Quick  Brown  Fox',
    'u-s lazy  dog',
)
Quick Brown Fox
U.S. lazy dog 

不需要一个
dict
,但是一个由两个
元组组成的
列表将是有意义的。如何运行这么多正则表达式而不出现重叠?源文件是一个提词器脚本,提词器软件的许多独特格式代码正在被删除,以及示例中常见的问题,用于显示暂停而不是逗号的连字符。该文件正在转换为纯文本以供人阅读。