Python 用于替换句子特定数字的正则表达式
我有一个类似下面的句子Python 用于替换句子特定数字的正则表达式,python,regex,string,re,Python,Regex,String,Re,我有一个类似下面的句子 test_str = r'Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him' 我想用“0”替换上面句子中的所有数字,电话号码应该只有第一个数字,即+1 result = r'Mr.
test_str = r'Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him'
我想用“0”替换上面句子中的所有数字,电话号码应该只有第一个数字,即+1
result = r'Mr.X has 00 apples and 00 oranges, his business partner from Colorado staying staying in hotel with phone number +1******** and his wife and kids are staying away from him'
我用下面的正则表达式替换电话号码模式,它总是有一个一致的数字
result = re.sub(r'(.*)?(+1)(\d{8})', r'\1\2********', test_str)
我可以在一个正则表达式中将除电话号码之外的其他数字替换为0吗?我们可以使用函数
要替换电话号码,可以使用下面的正则表达式。
所有后跟+1的数字将替换为等效数字*
result = re.sub(r'(?<!\w)(\+1)(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
若要将其他数字替换为0,可以使用下面的正则表达式,所有不在+或数字前面的数字将替换为等效数字0
result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), test_str)
范例
>>> test_str = r'Mr.X has 23 apples and 59 oranges, his phone number +188991234'
>>> result = re.sub(r'(?<!\w)(\+1)(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
>>> result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), result)
>>> result
'Mr.X has 00 apples and 00 oranges, his phone number +1********'
对于注释中的后续问题,为了保留3位数字,我们可以只修改+1部分的第一个正则表达式,而第二个正则表达式保持不变
>>> test_str = r'Mr.X has 23 apples and 59 oranges, his phone number +188991234'
>>> result = re.sub(r'(?<!\w)(\+\d{3})(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
>>> result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), result)
>>> result
'Mr.X has 00 apples and 00 oranges, his phone number +188******'
如果要保留电话号码的前3个号码,并使用单个模式保留可选的+1:
(?<!\S)((?:\+1)?)(\d{3})(\d{5})(?!\S)|\d+
我相信你需要两个替换字符串?。首先替换为00,然后替换为*****。看到了吗?如果字符串中有另一个数字带有一些其他符号,如1-22,则此操作无效,因此您希望它们为00或***?除电话号码外的所有数字都应替换为“0”。如何识别字符串中的电话号码?此操作非常有效。但是如果我需要保留电话号码的前三个号码,不管它是+1还是国家代码是+000,那么我在问题中提到的正则表达式仍然有效。但是你的答案会屏蔽电话号码中保留的所有号码。为了保留更多的电话号码数字,你可以将第一个正则表达式修改为r'?第一个正则表达式不会有问题。但是如果我使用第二个正则表达式,如果电话号码没有+,它也会将电话号码中保留的号码中的数字替换为0。例如:如果它只是一个8位数的号码,实际上是一个电话号码:12345678。使用第一个正则表达式,我可以将其屏蔽为1234****。但使用第二个正则表达式,它将替换电话号码中的数字,如0000****更准确地说,第二个正则表达式应该与符号+无关。但是应该查找*第二个正则表达式应该独立于符号+。如果电话号码中没有“+”,则此选项无效
(?<! Negative lookbehind
\S Match any char except a whitespace char
) Close group
( Capture group 1
(?: Non capture group
\+ Match + char
1 Match 1 char
)? Close group and repeat 0 or 1 times
) Close group
( Capture group 2
\d{3} Match a digit and repeat Match 3 times.
) Close group
( Capture group 3
\d{5} Match a digit and repeat Match 5 times.
) Close group
(?! Negative lookahead
\S Match any char except a whitespace char
) Close group
| Or
\d+ Match a digit and repeat 1 or more times
import re
pattern = r"(?<!\S)((?:\+1)?)(\d{3})(\d{5})(?!\S)|\d+"
s = ("Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him\n\n"
"This is a tel 12345678 and this is 1234567 123456789")
result = re.sub(
pattern,
lambda x: x.group(1) + x.group(2) + "*" * len(x.group(3)) if x.group(2) else "0" * len(x.group()),
s)
print(result)
Mr.X has 00 apples and 00 oranges, his business partner from Colorado staying staying in hotel with phone number +1889***** and his wife and kids are staying away from him
This is a tel 123***** and this is 0000000 000000000