Python 正则表达式：从url数据获取部分文本_Python_Regex

Python 正则表达式：从url数据获取部分文本

python regex

Python 正则表达式：从url数据获取部分文本,python,regex,Python,Regex,我有很多这样的url： http://www.example.com/some-text-to-get/jkl/another-text-to-get 我希望能够得到这个： ["some-text-to-get", "another-text-to-get"] 我试过这个： re.findall(".*([[a-z]*-[a-z]*]*).*", "http://www.example.com/some-text-to-get/jkl/another-text-to-get") 但它不起作

我有很多这样的url：

http://www.example.com/some-text-to-get/jkl/another-text-to-get

我希望能够得到这个：

["some-text-to-get", "another-text-to-get"]

我试过这个：

re.findall(".*([[a-z]*-[a-z]*]*).*", "http://www.example.com/some-text-to-get/jkl/another-text-to-get")

但它不起作用。有什么想法吗

您可以使用“向后看”和“向前看”：

import re
s = 'http://www.example.com/some-text-to-get/jkl/another-text-to-get'
final_result = re.findall('(?<=\.\w{3}/)[a-z\-]+|[a-z\-]+(?=$)', s)

您可以在捕获组中捕获两个部分：

这将符合：

```
^
```
从字符串开始匹配
```
https？：//
```
将http与可选的s匹配，后跟
```
：//
```
```
[^/]+/
```
使用后跟正斜杠的否定字符类，不匹配正斜杠
```
（[^/]+）
```
在组中捕获（组1）而不是正斜杠
```
*
```
将任何字符匹配零次或多次
```
/
```
逐字匹配（这是最后一个斜杠，因为
```
*
```
是贪婪的
```
（.*）$
```
在捕获组（组2）中匹配任何字符的零次或多次，并断言行的结尾
```
$
```

您的匹配项位于第一和第二捕获组中

或者，您可以解析url，获取路径，按

拆分，并按索引获取部件：

from urlparse import urlparse

o = urlparse('http://www.example.com/some-text-to-get/jkl/another-text-to-get')
parts = filter(None, o.path.split('/'))
print(parts[0])
print(parts[2])

或者，如果要获取包含

的零件，可以使用：

parts = filter(lambda x: '-' in x, o.path.split('/'))
print(parts)

给定：

>>> s
"http://www.example.com/some-text-to-get/jkl/another-text-to-get"

您可以使用此正则表达式：

>>> re.findall(r"/([a-z-]+)(?:/|$)", s)
['some-text-to-get', 'another-text-to-get']

当然，您可以使用Python字符串方法和列表理解来实现这一点：

>>> [e for e in s.split('/') if '-' in e]
['some-text-to-get', 'another-text-to-get']

您可以使用以下正则表达式捕获它：

（（？：[a-z]+-）+[a-z]+）

```
[a-z]+
```
匹配一个或多个字符
```
（？：[a-z]+-）
```
不在组中捕获

我只想要小写单词，这可以吗？无法使用

[a-z]

Python 3:from urllib.parse导入urlparse

>>> [e for e in s.split('/') if '-' in e]
['some-text-to-get', 'another-text-to-get']