Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/308.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 几个类似的正则表达式。更快的方法?_Python_Regex_Python 3.x_Performance - Fatal编程技术网

Python 几个类似的正则表达式。更快的方法?

Python 几个类似的正则表达式。更快的方法?,python,regex,python-3.x,performance,Python,Regex,Python 3.x,Performance,我有一套相当简单的要求。我有一个对象列表(长度为200万),每个对象都有两个需要重新执行的属性(其他属性不变) 零一二的值。。。需要将10更改为其数值:1 2。。。十, 示例: ONE MAIN STREET -> 1 MAIN STREET BONE ROAD -> BONE ROAD BUILDING TWO, THREE MAIN ROAD -> BUILDING 2, 3 MAIN ROAD ELEVEN MAIN ST -> ELEVEN MAIN STREET

我有一套相当简单的要求。我有一个对象列表(长度为200万),每个对象都有两个需要重新执行的属性(其他属性不变)

零一二的值。。。需要将10更改为其数值:1 2。。。十,

示例:

ONE MAIN STREET -> 1 MAIN STREET
BONE ROAD -> BONE ROAD
BUILDING TWO, THREE MAIN ROAD -> BUILDING 2, 3 MAIN ROAD
ELEVEN MAIN ST -> ELEVEN MAIN STREET
ONE HUNDRED FUNTOWN -> 1 HUNDRED FUNTOWN
很明显,有些数字没有改变,有些收费很奇怪这完全是意料中事

我可以用下面的内容来完成所有的工作。我的问题是,有没有一个聪明的方法让这一切运行得更快?我曾想过制作一个
字典列表
,其中键是单词数字,值是数字,但我认为这对性能没有帮助。或者
重新编译每个正则表达式并将它们传递到此函数中?有什么聪明的办法可以让它跑得更快吗

def update_word_to_numeric(entrylist):
    updated_entrylist = []
    for theentry in entrylist:
        theentry.addr_ln_1 = re.sub(r"\bZERO\b", "0", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bONE\b", "1", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bTWO\b", "2", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bTHREE\b", "3", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bFOUR\b", "4", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bFIVE\b", "5", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bSIX\b", "6", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bSEVEN\b", "7", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bEIGHT\b", "8", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bNINE\b", "9", theentry.addr_ln_1)
        theentry.addr_ln_1 = re.sub(r"\bTEN\b", "10", theentry.addr_ln_1)

        theentry.addr_ln_2 = re.sub(r"\bZERO\b", "0", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bONE\b", "1", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bTWO\b", "2", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bTHREE\b", "3", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bFOUR\b", "4", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bFIVE\b", "5", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bSIX\b", "6", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bSEVEN\b", "7", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bEIGHT\b", "8", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bNINE\b", "9", theentry.addr_ln_2)
        theentry.addr_ln_2 = re.sub(r"\bTEN\b", "10", theentry.addr_ln_2)
        updated_entrylist.append(theentry)
    return updated_entrylist

也许这只是一个很好的方法。“这就足够了”的评论对我也很好:)

使用一个正则表达式比使用十个要快得多(我注意到速度提高了3倍):

我使用鲜为人知的功能将
re.sub
函数作为第二个参数:它将获取一个匹配对象并返回替换字符串。这样我们就可以查找替换字符串


我还使用了
re.compile
来预编译正则表达式,这也缩短了时间,但没有大的变化那么大。

这里有一种使用字典的方法:

numbers = ["\bZERO\b", "\bONE\b", "\bTWO\b", "\bTHREE\b", "\bFOUR\b", "\bFIVE\b", "\bSIX\b", "\bSEVEN\b", "\bEIGHT\b", "\bNINE\b", "\bTEN\b"]
for theentry in entrylist:
    for i, number in enumerate(numbers):
        theentry.addr_ln_1 = re.sub(r"{}".format(number), "{}".format(i), theentry.addr_ln_1)
        theentry.addr_ln_2 = re.sub(r"{}".format(number), "{}".format(i), theentry.addr_ln_2)
s = '''
ONE MAIN STREET
BONE ROAD
BUILDING TWO, THREE MAIN ROAD
ELEVEN MAIN ST
ONE HUNDRED FUNTOWN
'''

d = {'ZERO':'0', 'ONE':'1', 'TWO':'2', 'THREE':'3', 'FOUR':'4', 
     'FIVE':'5', 'SIX':'6', 'SEVEN':'7', 'EIGHT':'8', 'NINE':'9', 
     'TEN':'10', 'ELEVEN':'11', 'TWELVE':'12'}

p = re.compile(r'\b(' + '|'.join(d.keys()) + r')\b')
r = p.sub(lambda x: d[x.group()], s)

print(r)

在字典中添加或删除您认为合适的条目。

有时我喜欢@wwii的想法,遗憾的是,这在这里不起作用,因为我们将匹配对象作为输入,而不是字符串。我喜欢它的可读性。我是否应该将
def replace
重命名为类似于
def replace\u number
?只是想一想在Python中如何使用
replace
。或者它必须被替换?这很好,这个过程是45秒,我的老方法,下降到3秒。太好了。@狙击手听了太好了!它不必是
replace
,当然,你可以随意命名。它不会与str.replace冲突。
s = '''
ONE MAIN STREET
BONE ROAD
BUILDING TWO, THREE MAIN ROAD
ELEVEN MAIN ST
ONE HUNDRED FUNTOWN
'''

d = {'ZERO':'0', 'ONE':'1', 'TWO':'2', 'THREE':'3', 'FOUR':'4', 
     'FIVE':'5', 'SIX':'6', 'SEVEN':'7', 'EIGHT':'8', 'NINE':'9', 
     'TEN':'10', 'ELEVEN':'11', 'TWELVE':'12'}

p = re.compile(r'\b(' + '|'.join(d.keys()) + r')\b')
r = p.sub(lambda x: d[x.group()], s)

print(r)