Python 使用re.compile和re.sub
我想从我的文本中删除URL:Python 使用re.compile和re.sub,python,regex,Python,Regex,我想从我的文本中删除URL: #Django url validator https://github.com/django/django/blob/master/django/core/validators.py regex = re.compile( r'^(?:http|ftp)s?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z
#Django url validator https://github.com/django/django/blob/master/django/core/validators.py
regex = re.compile(
r'^(?:http|ftp)s?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
r'localhost|' # localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4
r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
text = "http://test.com word1 word2 https://test.de word3"
text = re.sub(regex, '', text)
print text
输出仍然是:
http://test.com word1 word2 https://test.de word3
我的代码出了什么问题?您的正则表达式用
^
和$
字符锚定到字符串的开头和结尾。因此,只需删除它们:
regex = re.compile(
r'(?:http|ftp)s?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
r'localhost|' # localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4
r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)', re.IGNORECASE)