用python正则表达式匹配url中的类别？_Python_Regex

用python正则表达式匹配url中的类别？

python regex

用python正则表达式匹配url中的类别？,python,regex,Python,Regex,我想匹配以下url中的类别：新闻和政治请注意，可能有一个或多个类别。一个类别可以通过在文本边上加/或在数字中间加/来识别我尝试的是： item.url = 'http://www.example.com/news/politics/this-is-article-name-1993591' compiled_regex = re.compile('/.+(?!/)/') match = compiled_regex.search(item.url) 答案是否定的我希望获得的预期结果：

我想匹配以下url中的类别：新闻和政治

请注意，可能有一个或多个类别。一个类别可以通过在文本边上加/或在数字中间加/来识别

我尝试的是：

item.url = 'http://www.example.com/news/politics/this-is-article-name-1993591'

compiled_regex = re.compile('/.+(?!/)/')

match = compiled_regex.search(item.url)

答案是否定的

我希望获得的预期结果：

match.group(0) = `news`
match.group(1) = `politics`

根据你的定义，类似这样的东西：

categories = item.url.split('/')[3:-1]

根据你的定义，类似这样的东西：

categories = item.url.split('/')[3:-1]

我不使用正则表达式，而是使用它来解析URL

>>> url = 'http://www.example.com/news/politics/this-is-article-name-1993591'
>>> import urllib.parse

>>> urllib.parse.urlparse(url)
ParseResult(scheme='http',
            netloc='www.example.com',
            path='/news/politics/this-is-article-name-1993591',
            params='',
            query='',
            fragment='')

>>> urllib.parse.urlparse(url).path
'/news/politics/this-is-article-name-1993591'

>>> urllib.parse.urlparse(url).path.split('/')[1:-1]
['news', 'politics']

我不使用正则表达式，而是使用它来解析URL

>>> url = 'http://www.example.com/news/politics/this-is-article-name-1993591'
>>> import urllib.parse

>>> urllib.parse.urlparse(url)
ParseResult(scheme='http',
            netloc='www.example.com',
            path='/news/politics/this-is-article-name-1993591',
            params='',
            query='',
            fragment='')

>>> urllib.parse.urlparse(url).path
'/news/politics/this-is-article-name-1993591'

>>> urllib.parse.urlparse(url).path.split('/')[1:-1]
['news', 'politics']

但是，如果您确实需要使用regexp，其他人给出了有用的答案：

>>> import re
>>> url = 'http://www.example.com/news/politics/this-is-article-name-1993591'
>>> re.match('https?://[^/]+/([^/]+)/([^/]+)/', url).groups()
('news', 'politics')

但是，如果您确实需要使用regexp，其他人给出了有用的答案：

>>> import re
>>> url = 'http://www.example.com/news/politics/this-is-article-name-1993591'
>>> re.match('https?://[^/]+/([^/]+)/([^/]+)/', url).groups()
('news', 'politics')

预期产量是多少？你在找什么图案吗？请更清楚一点。预期的输出是什么？你在找什么图案吗？请更清楚一点。我测试了这个，只有当url中有2个类别时，1个和3个或更多类别才有效。是的，这意味着必须至少有两个类别。URL结构取决于站点，根据您试图解析的站点，您可能需要使用几种不同的策略。如果您认为一个regexp将匹配所有的regexp，那么您可能是以错误的方式处理问题。我测试了这一点，只有当url中有两个类别时，1个和3个或更多类别才有效。是的，这意味着必须至少有两个类别。URL结构取决于站点，根据您试图解析的站点，您可能需要使用几种不同的策略。如果您认为一个regexp将匹配所有的regexp，那么您可能是以错误的方式处理问题。