Python 使用Beautifulsoup提取标签的变量列表_Python_Beautifulsoup

Python 使用Beautifulsoup提取标签的变量列表

python

Python 使用Beautifulsoup提取标签的变量列表,python,beautifulsoup,Python,Beautifulsoup,我有以下结果集： Cake<a>Cream</a>Coffee 如何使用我的列表提取结果集的值？理想的情况下，我想买一本字典 dic[0]='Cake' dic[1]='Cream' dic[2]='Coffee' 基本上，我想连续搜索我的结果集，查找列表中的下一个标记。我可以使用find_all，但这意味着我必须手动映射 data = '''Cake&l

我有以下

结果集

：

<p>Cake</p><a>Cream</a><p>Coffee</p>

如何使用我的列表提取结果集的值？理想的情况下，我想买一本字典

dic[0]='Cake'
dic[1]='Cream'
dic[2]='Coffee'

基本上，我想连续搜索我的

结果集

，查找列表中的下一个标记。我可以使用

find_all

，但这意味着我必须手动映射

data = '''<p>Cake</p><a>Cream</a><p>Coffee</p>'''

dic = ['p', 'a', 'p']

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

i = iter(dic)

out = {i: tag.text for i, tag in enumerate(soup.find_all(lambda t: t.name == next(i)))}

print(out)

编辑：如果结果集中有不在搜索列表中的标记，则为版本：

data = '''<span>Don't search this</span>
            <p>Cake</p>
          <span>Don't search this</span>
            <a>Cream</a>
            <p>Coffee</p>'''

lst = ['p', 'a', 'p']

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

def search(lst):
    lst = lst[:]
    tag = yield
    while lst:
        if lst[0] == tag.name:
            lst.pop(0)
            tag = yield True
            continue
        tag = yield False

it = search(lst)
next(it)
out = {i: tag.text for i, tag in enumerate(soup.find_all(lambda t: it.send(t)))}

print(out)

编辑2：使用CSS选择器：

data = '''<span>Don't search this</span>
            <p>Cake</p>
          <span>Don't search this</span>
            <a>Cream</a>
            <p>Coffee</p>'''

lst = ['p', 'a', 'p']

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print({i: tag.text for i, tag in enumerate(soup.select(','.join(lst)))})

data = '''<span>Don't search this</span>
            <p>Cake</p>
          <span>Don't search this</span>
            <a>Cream</a>
            <p>Coffee</p>'''

lst = ['p', 'a', 'p']

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

def search(lst):
    lst = lst[:]
    tag = yield
    while lst:
        if lst[0] == tag.name:
            lst.pop(0)
            tag = yield True
            continue
        tag = yield False

it = search(lst)
next(it)
out = {i: tag.text for i, tag in enumerate(soup.find_all(lambda t: it.send(t)))}

print(out)

{0: 'Cake', 1: 'Cream', 2: 'Coffee'}

data = '''<span>Don't search this</span>
            <p>Cake</p>
          <span>Don't search this</span>
            <a>Cream</a>
            <p>Coffee</p>'''

lst = ['p', 'a', 'p']

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print({i: tag.text for i, tag in enumerate(soup.select(','.join(lst)))})

{0: 'Cake', 1: 'Cream', 2: 'Coffee'}