Python 使用Beautifulsoup提取标签的变量列表
我有以下Python 使用Beautifulsoup提取标签的变量列表,python,beautifulsoup,Python,Beautifulsoup,我有以下结果集: <p>Cake</p><a>Cream</a><p>Coffee</p> 如何使用我的列表提取结果集的值?理想的情况下,我想买一本字典 dic[0]='Cake' dic[1]='Cream' dic[2]='Coffee' 基本上,我想连续搜索我的结果集,查找列表中的下一个标记。我可以使用find_all,但这意味着我必须手动映射 data = '''<p>Cake</p>&l
结果集
:
<p>Cake</p><a>Cream</a><p>Coffee</p>
如何使用我的列表提取结果集的值?理想的情况下,我想买一本字典
dic[0]='Cake'
dic[1]='Cream'
dic[2]='Coffee'
基本上,我想连续搜索我的结果集
,查找列表中的下一个标记。我可以使用find_all
,但这意味着我必须手动映射
data = '''<p>Cake</p><a>Cream</a><p>Coffee</p>'''
dic = ['p', 'a', 'p']
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
i = iter(dic)
out = {i: tag.text for i, tag in enumerate(soup.find_all(lambda t: t.name == next(i)))}
print(out)
编辑:如果结果集中有不在搜索列表中的标记,则为版本:
data = '''<span>Don't search this</span>
<p>Cake</p>
<span>Don't search this</span>
<a>Cream</a>
<p>Coffee</p>'''
lst = ['p', 'a', 'p']
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
def search(lst):
lst = lst[:]
tag = yield
while lst:
if lst[0] == tag.name:
lst.pop(0)
tag = yield True
continue
tag = yield False
it = search(lst)
next(it)
out = {i: tag.text for i, tag in enumerate(soup.find_all(lambda t: it.send(t)))}
print(out)
编辑2:使用CSS选择器:
data = '''<span>Don't search this</span>
<p>Cake</p>
<span>Don't search this</span>
<a>Cream</a>
<p>Coffee</p>'''
lst = ['p', 'a', 'p']
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
print({i: tag.text for i, tag in enumerate(soup.select(','.join(lst)))})
data = '''<span>Don't search this</span>
<p>Cake</p>
<span>Don't search this</span>
<a>Cream</a>
<p>Coffee</p>'''
lst = ['p', 'a', 'p']
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
def search(lst):
lst = lst[:]
tag = yield
while lst:
if lst[0] == tag.name:
lst.pop(0)
tag = yield True
continue
tag = yield False
it = search(lst)
next(it)
out = {i: tag.text for i, tag in enumerate(soup.find_all(lambda t: it.send(t)))}
print(out)
{0: 'Cake', 1: 'Cream', 2: 'Coffee'}
data = '''<span>Don't search this</span>
<p>Cake</p>
<span>Don't search this</span>
<a>Cream</a>
<p>Coffee</p>'''
lst = ['p', 'a', 'p']
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
print({i: tag.text for i, tag in enumerate(soup.select(','.join(lst)))})
{0: 'Cake', 1: 'Cream', 2: 'Coffee'}