Python 替换字符串中的子字符串_Python_Regex_Optimization_Replace

Python 替换字符串中的子字符串

python regex optimization replace

Python 替换字符串中的子字符串,python,regex,optimization,replace,Python,Regex,Optimization,Replace,我的函数在字符串hex表示法（十六进制CSS颜色）中找到并替换为短表示法。例如：#000000可以表示为#000 import re def to_short_hex (string): match = re.findall(r'#[\w\d]{6}\b', string) for i in match: if not re.findall(r'#' + i[1] + '{6}', i): match.pop(match.index(

我的函数在字符串

hex

表示法（十六进制

CSS

颜色）中找到并替换为短表示法。
例如：

#000000

可以表示为

#000

import re

def to_short_hex (string):
    match = re.findall(r'#[\w\d]{6}\b', string)

    for i in match:
        if not re.findall(r'#' + i[1] + '{6}', i):
            match.pop(match.index(i))

    for i in match:
        string = string.replace(i, i[:-3])

    return string;

to_short_hex('text #FFFFFF text #000000 #08088')

Out:

text #FFF text #000 #08088

有没有办法使用

列表理解

等优化我的代码？

这个怎么样？您可以将

is6hexdigit

嵌入到

中，以加快嵌入速度，使其更具可读性
hexdigits = "0123456789abcdef"

def is6hexdigit(sub):
    l = sub.lower()
    return (l[0] in hexdigits) and (l.count(l[0]) == 6)

def to_short_hex(may_have_hexes):
    replaced = ((sub[3:] if is6hexdigit(sub[:6]) else sub)
                        for sub in may_have_hexes.split('#'))
    return '#'.join(replaced)

这个怎么样？您可以将is6hexdigit
嵌入到中，以加快嵌入速度，使其更具可读性
hexdigits = "0123456789abcdef"

def is6hexdigit(sub):
    l = sub.lower()
    return (l[0] in hexdigits) and (l.count(l[0]) == 6)

def to_short_hex(may_have_hexes):
    replaced = ((sub[3:] if is6hexdigit(sub[:6]) else sub)
                        for sub in may_have_hexes.split('#'))
    return '#'.join(replaced)

在列表上使用pop
，同时对其进行迭代，这始终是一个坏主意。因此，这不是一个优化，而是一个错误的纠正。此外，我还编辑了re
，以防止像“#34j342”
这样的字符串被接受：
>>> def to_short_hex(s):
...     matches = re.findall(r'#[\dabcdefABCDEF]{6}\b', s)
...     filtered = [m for m in matches if re.findall(r'#' + m[1] + '{6}', m)]
...     for m in filtered:
...         s = s.replace(m, m[:-3])
...     return s
... 
>>> to_short_hex('text #FFFFFF text #000000 #08088')
'text #FFF text #000 #08088'

另外，我认为在第二个re
中使用pop
对列表进行搜索就足够了，而在列表上进行迭代总是一个坏主意。因此，这不是一个优化，而是一个错误的纠正。此外，我还编辑了re
，以防止像“#34j342”
这样的字符串被接受：
>>> def to_short_hex(s):
...     matches = re.findall(r'#[\dabcdefABCDEF]{6}\b', s)
...     filtered = [m for m in matches if re.findall(r'#' + m[1] + '{6}', m)]
...     for m in filtered:
...         s = s.replace(m, m[:-3])
...     return s
... 
>>> to_short_hex('text #FFFFFF text #000000 #08088')
'text #FFF text #000 #08088'

另外，我认为re.search
在第二个re
中就足够了，这就是re.sub的作用！使用正则表达式查找某些内容，然后再执行一系列搜索和替换操作来更改它，这不是一个好主意。一方面，它很容易意外地更换你不想更换的东西，另一方面，它做了大量的冗余工作
此外，您可能希望将“#aaccee”缩短为“#ace”。这个例子也可以做到这一点：
def to_short_hex(s):
    def shorten_match(match):
        hex_string = match.group(0)
        if hex_string[1::2]==hex_string[2::2]:
            return '#'+hex_string[1::2]
        return hex_string
    return re.sub(r"#[\da-fA-F]{6}\b", shorten_match, s)

解释
re.sub
可以将一个函数应用于每个匹配项。它接收匹配对象并返回要在该点替换的字符串
切片表示法允许您应用步幅。十六进制字符串[1:：2]从字符串中每隔一秒提取一个字符，从索引1开始，一直到字符串的末尾。十六进制字符串[2:：2]从字符串中每隔一秒提取一个字符，从索引2开始，一直到结束。所以对于字符串“#aaccee”，我们得到了“ace”和“ace”，这两个匹配。对于字符串“#123456”，我们得到了不匹配的“135”和“246”。
这就是re.sub的用途！使用正则表达式查找某些内容，然后再执行一系列搜索和替换操作来更改它，这不是一个好主意。一方面，它很容易意外地更换你不想更换的东西，另一方面，它做了大量的冗余工作
此外，您可能希望将“#aaccee”缩短为“#ace”。这个例子也可以做到这一点：
def to_short_hex(s):
    def shorten_match(match):
        hex_string = match.group(0)
        if hex_string[1::2]==hex_string[2::2]:
            return '#'+hex_string[1::2]
        return hex_string
    return re.sub(r"#[\da-fA-F]{6}\b", shorten_match, s)

解释
re.sub
可以将一个函数应用于每个匹配项。它接收匹配对象并返回要在该点替换的字符串
切片表示法允许您应用步幅。十六进制字符串[1:：2]从字符串中每隔一秒提取一个字符，从索引1开始，一直到字符串的末尾。十六进制字符串[2:：2]从字符串中每隔一秒提取一个字符，从索引2开始，一直到结束。所以对于字符串“#aaccee”，我们得到了“ace”和“ace”，这两个匹配。对于字符串“#123456”，我们得到了“135”和“246”，它们不匹配。
ActiveState中有一个配方使用稍长的正则表达式@约翰P，更多的thax链接@我觉得你真的应该把它作为一个答案来发布。ActiveState有一个配方使用了稍微长一点的正则表达式@约翰P，更多的thax链接@JohnP我觉得你真的应该把它作为一个答案发布。@goingtam，这是一个模块名，可能因此应该被替换。@Ricardo Cárdenes，thx。我可能弄错了，但对我来说可读性较差：）是的，一大早就在这里。我使用了string
，只是因为OP使用了，而且string
模块无论如何都没有在函数中使用，但是的，我同意这不是理想的。@AlexanderGuinness:每当你开始使用理解时，事情往往会朝可读的方向发展；），但是试着将整个is6hexdigit
替换为（…if…else…
），你就会明白我的意思了：D@GoingTham，它是一个模块名，可能因此应该被替换。@Ricardo Cárdenes，thx。我可能弄错了，但对我来说可读性较差：）是的，一大早就在这里。我使用了string
，只是因为OP使用了，而且string
模块无论如何都没有在函数中使用，但是的，我同意这不是理想的。@AlexanderGuinness:每当你开始使用理解时，事情往往会朝可读的方向发展；），但是试着把整个都替换成（…如果…否则…
），你就会明白我的意思：你是对的。我忘记了这个符号：“#aaccee”
到“#ace”
你是对的。我忘记了这个符号：“#aaccee”
到“#ace”