Python 用于替换句子特定数字的正则表达式_Python_Regex_String_Re

Python 用于替换句子特定数字的正则表达式

python regex string

Python 用于替换句子特定数字的正则表达式,python,regex,string,re,Python,Regex,String,Re,我有一个类似下面的句子 test_str = r'Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him' 我想用“0”替换上面句子中的所有数字，电话号码应该只有第一个数字，即+1 result = r'Mr.

我有一个类似下面的句子

test_str = r'Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him'

我想用“0”替换上面句子中的所有数字，电话号码应该只有第一个数字，即+1

result = r'Mr.X has 00 apples and 00 oranges, his business partner from Colorado staying staying in hotel with phone number +1******** and his wife and kids are staying away from him'

我用下面的正则表达式替换电话号码模式，它总是有一个一致的数字

result = re.sub(r'(.*)?(+1)(\d{8})', r'\1\2********', test_str)

我可以在一个正则表达式中将除电话号码之外的其他数字替换为0吗？

我们可以使用函数

要替换电话号码，可以使用下面的正则表达式。所有后跟+1的数字将替换为等效数字*

result = re.sub(r'(?<!\w)(\+1)(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)

若要将其他数字替换为0，可以使用下面的正则表达式，所有不在+或数字前面的数字将替换为等效数字0

result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), test_str)

范例

>>> test_str = r'Mr.X has 23 apples and 59 oranges, his phone number +188991234'
>>> result = re.sub(r'(?<!\w)(\+1)(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
>>> result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), result)
>>> result
'Mr.X has 00 apples and 00 oranges, his phone number +1********'

对于注释中的后续问题，为了保留3位数字，我们可以只修改+1部分的第一个正则表达式，而第二个正则表达式保持不变

>>> test_str = r'Mr.X has 23 apples and 59 oranges, his phone number +188991234'
>>> result = re.sub(r'(?<!\w)(\+\d{3})(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
>>> result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), result)
>>> result
'Mr.X has 00 apples and 00 oranges, his phone number +188******'

如果要保留电话号码的前3个号码，并使用单个模式保留可选的+1：

(?<!\S)((?:\+1)?)(\d{3})(\d{5})(?!\S)|\d+

我相信你需要两个替换字符串？。首先替换为00，然后替换为*****。看到了吗？如果字符串中有另一个数字带有一些其他符号，如1-22，则此操作无效，因此您希望它们为00或***？除电话号码外的所有数字都应替换为“0”。如何识别字符串中的电话号码？此操作非常有效。但是如果我需要保留电话号码的前三个号码，不管它是+1还是国家代码是+000，那么我在问题中提到的正则表达式仍然有效。但是你的答案会屏蔽电话号码中保留的所有号码。为了保留更多的电话号码数字，你可以将第一个正则表达式修改为r'？第一个正则表达式不会有问题。但是如果我使用第二个正则表达式，如果电话号码没有+，它也会将电话号码中保留的号码中的数字替换为0。例如：如果它只是一个8位数的号码，实际上是一个电话号码：12345678。使用第一个正则表达式，我可以将其屏蔽为1234****。但使用第二个正则表达式，它将替换电话号码中的数字，如0000****更准确地说，第二个正则表达式应该与符号+无关。但是应该查找*第二个正则表达式应该独立于符号+。如果电话号码中没有“+”，则此选项无效

(?<!     Negative lookbehind
  \S     Match any char except a whitespace char
)        Close group
(        Capture group 1
  (?:    Non capture group
    \+   Match + char
    1    Match 1 char
  )?     Close group and repeat 0 or 1 times
)        Close group
(        Capture group 2
  \d{3}  Match a digit and repeat Match 3 times.
)        Close group
(        Capture group 3
  \d{5}  Match a digit and repeat Match 5 times.
)        Close group
(?!      Negative lookahead
  \S     Match any char except a whitespace char
)        Close group
|        Or
\d+      Match a digit and repeat 1 or more times

import re

pattern = r"(?<!\S)((?:\+1)?)(\d{3})(\d{5})(?!\S)|\d+"

s = ("Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him\n\n"
            "This is a tel 12345678 and this is 1234567 123456789")

result = re.sub(
    pattern,
    lambda x: x.group(1) + x.group(2) + "*" * len(x.group(3)) if x.group(2) else "0" * len(x.group()),
    s)
print(result)

Mr.X has 00 apples and 00 oranges, his business partner from Colorado staying staying in hotel with phone number +1889***** and his wife and kids are staying away from him

This is a tel 123***** and this is 0000000 000000000