Python中的正则表达式子绑定导致出现ASCII字符
我试图用正则表达式来替换一些文本中的一些问题 字符串如下所示:Python中的正则表达式子绑定导致出现ASCII字符,python,regex,string,Python,Regex,String,我试图用正则表达式来替换一些文本中的一些问题 字符串如下所示: >>>> import re >>>> a = "Here is a shortString with various issuesWith spacing" >>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a) >>>> Here is a short String with various is
>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
a=“这是一个包含各种问题的短字符串,带有空格”
我的正则表达式现在看起来像这样:
new_string=re.sub(“[a-z][a-z]”,“\1\2”,a)
这将采用缺少空格的位置(小写字母后始终有一个大写字母),并添加一个空格
不幸的是,输出如下所示:
>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
这是一个带有不同发行间隔的shor\x01\x02字符串
我希望它看起来像这样:
>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
b=“这是一个短字符串,有各种空格问题”
似乎正则表达式正确地匹配了我想要更改的内容的正确实例,但是我的替换有问题。我认为\1\2
意味着替换为正则表达式的第一部分,添加一个空格,然后添加第二个匹配项。但出于某种原因我得到了别的东西
>>> a = "Here is a shortString with various issuesWith spacing"
>>> re.sub("([a-z])([A-Z])", r"\1 \2", a)
'Here is a short String with various issues With spacing'
捕获组和反斜杠转义丢失
你可以更进一步:
>>> a = "Here is a shortString with various issuesWith spacing"
>>> re.sub('([a-z])([A-Z])', r'\1 \2', a).lower().capitalize()
'Here is a short string with various issues with spacing'
您需要定义捕获组,并使用原始字符串文字:
import re
a = "Here is a shortString with various issuesWith spacing"
new_string = re.sub(r"([a-z])([A-Z])", r"\1 \2", a)
print(new_string)
看
请注意,如果没有
r'
前缀,Python将\1
和\2
解释为字符而不是反向引用,因为\
是作为转义序列的一部分进行解析的。在原始字符串文本中,\
被解析为文本反斜杠。您可以这样尝试:
>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
>>>>重新导入
>>>>a=“这是一个短字符串,包含各种带空格的问题”
>>>>re.sub(r)(?您需要原始字符串。在两个字符串声明前面添加r
。r“[a-z][a-z]”,r“\1\2”
。您没有设置捕获组,请使用”([a-z])([a-z])”
。然后使用r'\1\2'