Python 使用Beautifulsoup查找具有多个搜索参数的多个标记_Python_Python 3.x_Web Scraping_Beautifulsoup

Python 使用Beautifulsoup查找具有多个搜索参数的多个标记

python python-3.x web-scraping

Python 使用Beautifulsoup查找具有多个搜索参数的多个标记,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我有一个html代码，看起来像这样的汤： <label for="02" class="highlited">"Some text here"</label> <span class="type3 type3-display"> <label for="01" class="highlited">"Some text here"</label> <span class="type1 type1-display"> <l

我有一个html代码，看起来像这样的汤：

<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display">
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display">
<label> Somete text here </label>
<span class="type999 type999-display">
<span class="type1 type1-display">

但现在我还需要找到所有带有特定措辞的跨距，并尊重网页上的顺序

我试过了，但没用：

soup.find_all(['label', 'span'], [{'for': re.compile('.*')}, {'class': 'type1'}], recursive=False) # here i just used {'class': 'type1'} becase I don't know how to pass in a list to soup to search for a match)

提前谢谢你

编辑：我还尝试将2个“查找所有搜索”与+组合，但后来我失去了顺序。

edit2:拼写

你也可以不用正则表达式来拼写

from bs4 import BeautifulSoup
data='''<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display"></span>
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display"></span>
<label> Somete text here </label>
<span class="type999 type999-display"></span>
<span class="type1 type1-display"></span>'''

myList = ['type1', 'type2', 'type3']
soup=BeautifulSoup(data,'html.parser')

for item in soup.find_all():
    if (item.name=='label') and 'for' in item.attrs :
       print(item)
    if (item.name == 'span') and item['class'][0] in myList :
        print(item)

输出：

当你说的跨度包含列表中的单词时，例如type1，你的意思是说class属性包含type1、type2等吗？而且，仅仅说类包含类型就足够了吗？还是必须同时指定数字？跨度上应该有一些结束标记吗？谢谢你，这太完美了-我不知道我可以在你的解决方案中这样做。但同时，请提供您的第一个解决方案，可能对其他遇到您答案的人有用！

soup.find_all(['label', 'span'], [{'for': re.compile('.*')}, {'class': 'type1'}], recursive=False) # here i just used {'class': 'type1'} becase I don't know how to pass in a list to soup to search for a match)

from bs4 import BeautifulSoup
data='''<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display"></span>
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display"></span>
<label> Somete text here </label>
<span class="type999 type999-display"></span>
<span class="type1 type1-display"></span>'''

myList = ['type1', 'type2', 'type3']
soup=BeautifulSoup(data,'html.parser')

for item in soup.find_all():
    if (item.name=='label') and 'for' in item.attrs :
       print(item)
    if (item.name == 'span') and item['class'][0] in myList :
        print(item)

<label class="highlited" for="02">"Some text here"</label>
<span class="type3 type3-display"></span>
<label class="highlited" for="01">"Some text here"</label>
<span class="type1 type1-display"></span>
<span class="type1 type1-display"></span>