在Python3中,从文本中提取URL,URL之间不留空格

在Python3中,从文本中提取URL,URL之间不留空格,python,python-3.x,regex,Python,Python 3.x,Regex,我对python正则表达式有问题,我想提取文本中除电子邮件地址以外的任何URL。如果url前面没有空格,我当前的正则表达式模式仍然无法提取url。这是我的正则表达式模式 \b((?:(?:https|ftp|http)?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xx

我对python正则表达式有问题,我想提取文本中除电子邮件地址以外的任何URL。如果url前面没有空格,我当前的正则表达式模式仍然无法提取url。这是我的正则表达式模式

\b((?:(?:https|ftp|http)?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)/)(?:[^\s()<>{}]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:\'\".,<>?«»“”‘’])|(?:(?<!@)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b/?(?!@)))
\b((?:(?:https | ftp | http):(?:/{1,3}|[a-z0-9%])和[a-z0-9.\-]+[.](简称:::::)com(124)网站网站(124)网络(124)网站(124)网站(124)网站网站(124)互联网(124)网络(124)网站(124)网站(124)网站(124)网络(124)网络(124)网络(124)网站(124)网站(124)网站(124)网站(124)网站(124)网络)网站(124)网站(124)网站(124)互联网)网站(124)网站(网络)网站(124)运营商)亚洲亚洲亚洲(商业运营商(124)运营商)猫猫(124)运营商)猫猫(猫猫)运营运营商(猫猫)合合合合合合运营商(124)运营商(124)运营商(124)运营商(124)运营商(124)运营商)方方(124)运营商(124)信息(124)信息(124)方方(124)信息(124)互联网(124)方)方方方方方)方方(124)电电电电电运营商(124)方)《卡本本斯》的瓦瓦本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本本| cv | cx | cy | cz | dd | de | dj | dk | dm | do | dz | ec | ee | eg | eh | er | es | et | eu fi 124; fj | fk | fm foGf| G| G| G| GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG卡本尼·凯凯撒·基本本本本本尼·卡卡本尼·卡卡本尼·卡卡本尼·卡卡本尼·卡本尼·卡卡本尼·卡卡本尼·卡卡本尼·卡卡本尼·卡卡本尼·卡本本尼·卡本尼·卡本尼·卡本尼·卡本本尼·卡本尼·卡本本尼·卡本本尼·卡本尼·卡本本本尼·卡本本尼·卡本本尼·卡本本尼·卡本本本本尼·卡本本尼·卡本本本尼·卡本尼·卡本本本本尼·卡本本尼·卡本尼·卡本尼·卡本尼·卡本本本本本本尼·卡本本本本本本尼·卡本尼·卡本本尼·卡本本本本尼·卡本本本尼·卡本尼·卡本本本本本本本本尼·本本本本本尼·本本本本本本尼“mv”奈奈奈奈何,奈奈奈奈奈奈奈奈奈何,奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈奈rw | sa | sb | sc | sd | se|本周四的赛方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方方英国-英国(1244)UUU124英国(1244)UU124英国(美国)U124英国(1244)UUU124英国英国(1244)UU124英国(1244)UU124英国英国(1244)美国(1244)美国(1244)UUU1244)UUUUU1244)UUUUUUUU1244.UUUUUUUUUUU1244.UUUUUUUUUUUUUUU1244,UUUUU1244.UUUUUUU1244.UUUUU1244.UUUU12;U12;U12;U12;U12;U12;U12;U12;U12;U12;U12;U12;维维维维维维维方方方(1244;维维维维维方方(1244;维维方方方)的维方方,vc(1244;维方)维方,vc(1244;^\s()]*?\)\([^\s]+?\)\[^\s`!()\[\]{};:\'\',«»'''))\(?:(?

您可以检查此正则表达式编辑器(),如果前面没有空格,我的模式仍然无法识别URL,欢迎提供任何提示或解决方案。

如果前面有空格以外的字符,则不再是URL:)

从:

通常,URL的编写方式如下:

  <scheme>:<scheme-specific-part>
可能是您想要的。第一组将匹配URL,即使前面有数字、字母或下划线。

请继续阅读
[\s\w]*?