Python 特定URL上的Regexp_Python_Regex

Python 特定URL上的Regexp

python regex

Python 特定URL上的Regexp,python,regex,Python,Regex,我有如下URL列表： http://www.toto.com/bags/handbags/test1/ http://www.toto.com/bags/handbags/smt1/ http://www.toto.com/bags/handbags/test1/test2/ http://www.toto.com/bags/handbags/blabla1/blabla2/ http://www.toto.com/bags/handbags/smt1/smt2/ http://www.toto

我有如下URL列表：

http://www.toto.com/bags/handbags/test1/
http://www.toto.com/bags/handbags/smt1/
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
http://www.toto.com/bags/handbags/smt1/smt2/testing/
http://www.toto.com/bags/handbags/smt1/smt2/testing.html

我在这里想要的是只接受像这样的URL

http://www.toto.com/something/else/again/more

仅限于此，如果有更多，则不服用

你能帮我吗？：）

适当的正则表达式是：

^http://www.toto.com/(\w+/){4}$

过滤示例：

>>> for line in lines:
...     if re.match(r'^http://www.toto.com/(\w+/){4}$', line):
...         print line
... 
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/

您可以这样做：

但在最后一步中添加

http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}

因此：

目前还不清楚这种模式是什么。列表中的哪些URL是有效的？编辑：真糟糕！第三，第四和第五个！

http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}$