Python 如何让靓汤（bs4）只匹配一个css类_Python_Web Scraping_Beautifulsoup

Python 如何让靓汤（bs4）只匹配一个css类

python web-scraping

Python 如何让靓汤（bs4）只匹配一个css类,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我使用以下代码来匹配所有具有CSS类“ad_item”的div 我遇到的问题是，在那个网页上，还有一个div将CSS类设置为“ad_ex_item”和“ad_ex_item” 当您搜索与某个CSS类匹配的标记时，您匹配任何CSS类：那么我如何匹配div，它只有“ad_item”，而没有“ad_ex_item” 或者换句话说，如何搜索只有CSS类“ad_item”的div？您是否尝试使用select：不幸的是，CSS3的：not选择器似乎不受支持。如果你真的需要这个，你可能需要看看。它

我使用以下代码来匹配所有具有CSS类“ad_item”的div

我遇到的问题是，在那个网页上，还有一个div将CSS类设置为“ad_ex_item”和“ad_ex_item”

当您搜索与某个CSS类匹配的标记时，您匹配任何CSS类：

那么我如何匹配div，它只有“ad_item”，而没有“ad_ex_item”

或者换句话说，如何搜索只有CSS类“ad_item”的div？

您是否尝试使用

select

：

不幸的是，CSS3的

：not

选择器似乎不受支持。如果你真的需要这个，你可能需要看看。它似乎支持它。请参见您是否尝试使用

选择：
不幸的是，CSS3的：not
选择器似乎不受支持。如果你真的需要这个，你可能需要看看。它似乎支持它。请参见
您可以随时将该函数传递到find_all（）：
您可以始终，并将该函数传递到find_all（）：
我找到了一个解决方案，虽然它与BS4无关，但它是纯python代码
for item in soup.find_all('div',class_="ad_item"):
     if len(item["class"]) != 1:
         continue;

如果有多个CSS类，它基本上会跳过该项
 我找到了一个解决方案，虽然它与BS4无关，但它是纯python代码
for item in soup.find_all('div',class_="ad_item"):
     if len(item["class"]) != 1:
         continue;

如果有多个CSS类，它基本上会跳过该项
 您可以将lambda函数传递给find
和find\u all
方法
soup.find_all(lambda x:
    x.name == 'div' and
    'ad_item' in x.get('class', []) and
    not 'ad_ex_item' in x['class']
)

x.get（'class'，[]）
将避免KeyError
不带class
属性的div
标记的异常
如果需要排除多个类，可以用以下条件替换最后一个条件：
    not any(c in x['class'] for c in {'ad_ex_item', 'another_class'})

如果您想完全排除某些类，可以使用：
   not all(c in x['class'] for c in {'ad_ex_item', 'another_class'})

您可以将lambda函数传递给find
和find\u all
方法
soup.find_all(lambda x:
    x.name == 'div' and
    'ad_item' in x.get('class', []) and
    not 'ad_ex_item' in x['class']
)

x.get（'class'，[]）
将避免KeyError
不带class
属性的div
标记的异常
如果需要排除多个类，可以用以下条件替换最后一个条件：
    not any(c in x['class'] for c in {'ad_ex_item', 'another_class'})

如果您想完全排除某些类，可以使用：
   not all(c in x['class'] for c in {'ad_ex_item', 'another_class'})

您可以使用以下严格条件：
soup.select("div[class='ad_item']")

具有确切类别的catchdiv。
在这种情况下，只有'ad_item'
且没有其他空格类加入。
可以使用如下严格条件：
soup.select("div[class='ad_item']")

具有确切类别的catchdiv。
在这种情况下，只有'ad_item'
且没有其他空格类加入。
最上面的答案是正确的，但是如果您想要一种保持for循环干净或类似单行解决方案的方法，请使用下面的列表
data = [item for item in soup.find_all("div", class_="ad_item") if len(item["class"]) == 1] 

上面的答案是正确的，但是如果您想要一种保持for循环干净或类似单线解决方案的方法，请使用下面的列表
data = [item for item in soup.find_all("div", class_="ad_item") if len(item["class"]) == 1] 

我正在使用Beautiful Soup 4，BS4没有回程。我正在使用Beautiful Soup 4，BS4没有回程。