在Python中使用BeautifulSoup查找字符串_Python_Beautifulsoup

在Python中使用BeautifulSoup查找字符串

python

在Python中使用BeautifulSoup查找字符串,python,beautifulsoup,Python,Beautifulsoup,我需要从如下字符串中提取“/html/path”： generic/html/path/generic/generic/generic 我只需要“path”，它总是在“html/”之后。因此，有没有一种方法可以搜索“html/”并在其后获取字符串，直到“/”出现？这只是基本的字符串操作 s="generic/html/path/generic/generic/generic" i1= s.index("html/") + 5 i2= s.index("/", i1) print s[i1:i2

我需要从如下字符串中提取“/html/path”：

generic/html/path/generic/generic/generic

我只需要“path”，它总是在“html/”之后。因此，有没有一种方法可以搜索“html/”并在其后获取字符串，直到“/”出现？

这只是基本的字符串操作

s="generic/html/path/generic/generic/generic"
i1= s.index("html/") + 5
i2= s.index("/", i1)
print s[i1:i2]

您可以使用正则表达式：

>>> regex = re.compile(".+html/(.+?)/")
>>> r = regex.search("generic/html/path/generic/generic/generic")
>>> r.groups()
(u'path',)

Python文档：

要添加到混合中的另一个：

In [1]: s = 'generic/html/path/generic/generic/generic'

In [2]: s.split('html/')[1].split('/')[0]
Out[2]: 'path'

那应该是

“+html/（.+？）/”

，“在它后面获取字符串，直到“/”出现”。哦，他只想要

路径，而不是其余的路径，谢谢，更新了帖子。我对beautifulsoup不太了解，但在文档中有一些关于regex的内容：这个例子只是查找标记，但如果你可以在属性中搜索的话（我认为href
）它应该以同样的方式工作。