Python中的正则表达式子绑定导致出现ASCII字符

Python中的正则表达式子绑定导致出现ASCII字符,python,regex,string,Python,Regex,String,我试图用正则表达式来替换一些文本中的一些问题 字符串如下所示: >>>> import re >>>> a = "Here is a shortString with various issuesWith spacing" >>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a) >>>> Here is a short String with various is

我试图用正则表达式来替换一些文本中的一些问题

字符串如下所示:

>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
a=“这是一个包含各种问题的短字符串,带有空格”

我的正则表达式现在看起来像这样:
new_string=re.sub(“[a-z][a-z]”,“\1\2”,a)

这将采用缺少空格的位置(小写字母后始终有一个大写字母),并添加一个空格

不幸的是,输出如下所示:

>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
这是一个带有不同发行间隔的shor\x01\x02字符串

我希望它看起来像这样:

>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
b=“这是一个短字符串,有各种空格问题”

似乎正则表达式正确地匹配了我想要更改的内容的正确实例,但是我的替换有问题。我认为
\1\2
意味着替换为正则表达式的第一部分,添加一个空格,然后添加第二个匹配项。但出于某种原因我得到了别的东西

>>> a = "Here is a shortString with various issuesWith spacing"
>>> re.sub("([a-z])([A-Z])", r"\1 \2", a)
'Here is a short String with various issues With spacing'
捕获组和反斜杠转义丢失

你可以更进一步:

>>> a = "Here is a shortString with various issuesWith spacing"
>>> re.sub('([a-z])([A-Z])', r'\1 \2', a).lower().capitalize()
'Here is a short string with various issues with spacing'

您需要定义捕获组,并使用原始字符串文字:

import re
a = "Here is a shortString with various issuesWith spacing"
new_string = re.sub(r"([a-z])([A-Z])", r"\1 \2", a)
print(new_string)


请注意,如果没有
r'
前缀,Python将
\1
\2
解释为字符而不是反向引用,因为
\
是作为转义序列的一部分进行解析的。在原始字符串文本中,
\
被解析为文本反斜杠。

您可以这样尝试:

>>>> import re
>>>> a = "Here is a shortString with various issuesWith spacing"
>>>> re.sub(r"(?<=[a-z])(?=[A-Z])", " ", a)
>>>> Here is a short String with various issues With spacing
>>>>重新导入
>>>>a=“这是一个短字符串,包含各种带空格的问题”

>>>>re.sub(r)(?您需要原始字符串。在两个字符串声明前面添加
r
r“[a-z][a-z]”,r“\1\2”
。您没有设置捕获组,请使用
”([a-z])([a-z])”
。然后使用
r'\1\2'