Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用python拆分url_Python_Regex_Python 2.7_Split - Fatal编程技术网

用python拆分url

用python拆分url,python,regex,python-2.7,split,Python,Regex,Python 2.7,Split,我有这个网址: /drive/rayon.productlist.seomenulevel/fh_refpath$003dfacet_1$0026fh_refview$003dlister$0026fh_view_size$003d100$0026fh_reffacet$003dcategories$0026auchan_page_type$003dcatalogue$0026fh_location$003d$00252f$00252f52$00252ffr_FR$00252fdrive_id$

我有这个网址:

/drive/rayon.productlist.seomenulevel/fh_refpath$003dfacet_1$0026fh_refview$003dlister$0026fh_view_size$003d100$0026fh_reffacet$003dcategories$0026auchan_page_type$003dcatalogue$0026fh_location$003d$00252f$00252f52$00252ffr_FR$00252fdrive_id$00253d993$00252fcategories$00253c$00257b52_3686967$00257d$00252fcategories$00253c$00257b52_3686967_3686326$00257d$00252fcategories$00253c$00257b52_3686967_3686326_3700610$00257d$00252fcategories$00253c$00257b52_3686967_3686326_3700610_3700620$00257d/Capsules$0020$002843$0029/3700620?t:ac=3686967/3700610
我想要最后3个数字:项目[0]=370620,项目[1]=3686967和项目[2]=370601

我试过这个

one =   url.split('/')[-1]
two =   url.split('/')[-2]
第一个的结果是
3706010”


第二种方法是非正则表达式方法,包括使用和一些拆分:

>>> import urlparse
>>> parsed_url = urlparse.urlparse(url) 
>>> number1 = parsed_url.path.split("/")[-1]
>>> number2, number3 = urlparse.parse_qs(parsed_url.query)["t:ac"][0].split("/")
>>> number1, number2, number3
('3700620', '3686967', '3700610')
正则表达式方法:

>>> import re
>>> re.search(r"/(\d+)\?t:ac=(\d+)/(\d+)$", url).groups()
('3700620', '3686967', '3700610')
如果
(\d+)
匹配一个或多个数字,
\?
将匹配文字问号(我们需要将其转义,因为它具有特殊含义),
$
将匹配字符串的结尾

您还可以创建和制作字典:

>>> re.search(r"/(?P<number1>\d+)\?t:ac=(?P<number2>\d+)/(?P<number3>\d+)", url).groupdict()
{'number2': '3686967', 'number3': '3700610', 'number1': '3700620'}
>>重新搜索(r”/(?P\d+)\?t:ac=(?P\d+)/(?P\d+),url).groupdict()
{'number2':'3686967','number3':'370610','number1':'370620'}

非正则表达式方法将涉及使用和一些拆分:

>>> import urlparse
>>> parsed_url = urlparse.urlparse(url) 
>>> number1 = parsed_url.path.split("/")[-1]
>>> number2, number3 = urlparse.parse_qs(parsed_url.query)["t:ac"][0].split("/")
>>> number1, number2, number3
('3700620', '3686967', '3700610')
正则表达式方法:

>>> import re
>>> re.search(r"/(\d+)\?t:ac=(\d+)/(\d+)$", url).groups()
('3700620', '3686967', '3700610')
如果
(\d+)
匹配一个或多个数字,
\?
将匹配文字问号(我们需要将其转义,因为它具有特殊含义),
$
将匹配字符串的结尾

您还可以创建和制作字典:

>>> re.search(r"/(?P<number1>\d+)\?t:ac=(?P<number2>\d+)/(?P<number3>\d+)", url).groupdict()
{'number2': '3686967', 'number3': '3700610', 'number1': '3700620'}
>>重新搜索(r”/(?P\d+)\?t:ac=(?P\d+)/(?P\d+),url).groupdict()
{'number2':'3686967','number3':'370610','number1':'370620'}

以下两种方法应该有效

url.split('/')[-2].split('=')[1]
url.split('/')[-2].split('?')[0]

以下两种方法应该有效

url.split('/')[-2].split('=')[1]
url.split('/')[-2].split('?')[0]
试试这个:

split_list = url.split('/')
third = split_list[-1]
first, second = split_list[-2].split('?t:ac=')
试试这个:

split_list = url.split('/')
third = split_list[-1]
first, second = split_list[-2].split('?t:ac=')

另一个使用regex的解决方案

import re
re.findall('\d+', url)[-3:]

另一个使用regex的解决方案

import re
re.findall('\d+', url)[-3:]

这似乎非常特定于这个确切的URL,似乎非常特定于这个确切的URL这就是为什么我讨厌正则表达式。它看起来很凌乱。虽然很容易,如果你试着去理解它。这就是为什么我讨厌正则表达式。它看起来很凌乱。虽然很容易,如果你试着去理解它。