Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用Beautifulsoup查找具有多个搜索参数的多个标记_Python_Python 3.x_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 使用Beautifulsoup查找具有多个搜索参数的多个标记

Python 使用Beautifulsoup查找具有多个搜索参数的多个标记,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我有一个html代码,看起来像这样的汤: <label for="02" class="highlited">"Some text here"</label> <span class="type3 type3-display"> <label for="01" class="highlited">"Some text here"</label> <span class="type1 type1-display"> <l

我有一个html代码,看起来像这样的汤:

<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display">
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display">
<label> Somete text here </label>
<span class="type999 type999-display">
<span class="type1 type1-display">
但现在我还需要找到所有带有特定措辞的跨距,并尊重网页上的顺序

我试过了,但没用:

soup.find_all(['label', 'span'], [{'for': re.compile('.*')}, {'class': 'type1'}], recursive=False) # here i just used {'class': 'type1'} becase I don't know how to pass in a list to soup to search for a match)
提前谢谢你

编辑:我还尝试将2个“查找所有搜索”与+组合,但后来我失去了顺序。
edit2:拼写

你也可以不用正则表达式来拼写

from bs4 import BeautifulSoup
data='''<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display"></span>
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display"></span>
<label> Somete text here </label>
<span class="type999 type999-display"></span>
<span class="type1 type1-display"></span>'''

myList = ['type1', 'type2', 'type3']
soup=BeautifulSoup(data,'html.parser')

for item in soup.find_all():
    if (item.name=='label') and 'for' in item.attrs :
       print(item)
    if (item.name == 'span') and item['class'][0] in myList :
        print(item)
输出:


当你说的跨度包含列表中的单词时,例如type1,你的意思是说class属性包含type1、type2等吗?而且,仅仅说类包含类型就足够了吗?还是必须同时指定数字?跨度上应该有一些结束标记吗?谢谢你,这太完美了-我不知道我可以在你的解决方案中这样做。但同时,请提供您的第一个解决方案,可能对其他遇到您答案的人有用!
soup.find_all(['label', 'span'], [{'for': re.compile('.*')}, {'class': 'type1'}], recursive=False) # here i just used {'class': 'type1'} becase I don't know how to pass in a list to soup to search for a match)
from bs4 import BeautifulSoup
data='''<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display"></span>
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display"></span>
<label> Somete text here </label>
<span class="type999 type999-display"></span>
<span class="type1 type1-display"></span>'''

myList = ['type1', 'type2', 'type3']
soup=BeautifulSoup(data,'html.parser')

for item in soup.find_all():
    if (item.name=='label') and 'for' in item.attrs :
       print(item)
    if (item.name == 'span') and item['class'][0] in myList :
        print(item)
<label class="highlited" for="02">"Some text here"</label>
<span class="type3 type3-display"></span>
<label class="highlited" for="01">"Some text here"</label>
<span class="type1 type1-display"></span>
<span class="type1 type1-display"></span>