Python 特定URL上的Regexp
我有如下URL列表:Python 特定URL上的Regexp,python,regex,Python,Regex,我有如下URL列表: http://www.toto.com/bags/handbags/test1/ http://www.toto.com/bags/handbags/smt1/ http://www.toto.com/bags/handbags/test1/test2/ http://www.toto.com/bags/handbags/blabla1/blabla2/ http://www.toto.com/bags/handbags/smt1/smt2/ http://www.toto
http://www.toto.com/bags/handbags/test1/
http://www.toto.com/bags/handbags/smt1/
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
http://www.toto.com/bags/handbags/smt1/smt2/testing/
http://www.toto.com/bags/handbags/smt1/smt2/testing.html
我在这里想要的是只接受像这样的URL
http://www.toto.com/something/else/again/more
仅限于此,如果有更多,则不服用
你能帮我吗?:) 适当的正则表达式是:
^http://www.toto.com/(\w+/){4}$
过滤示例:
>>> for line in lines:
... if re.match(r'^http://www.toto.com/(\w+/){4}$', line):
... print line
...
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
您可以这样做:
但在最后一步中添加$
http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}
因此:
目前还不清楚这种模式是什么。列表中的哪些URL是有效的?编辑:真糟糕!第三,第四和第五个!
http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}$