Python 使用beautifulsoup或selenium在具有其他类的其他div中选择特定的div类名，但不选择其他div_Python_Html_Css_Web Scraping_Beautifulsoup

Python 使用beautifulsoup或selenium在具有其他类的其他div中选择特定的div类名，但不选择其他div

python html css web-scraping

Python 使用beautifulsoup或selenium在具有其他类的其他div中选择特定的div类名，但不选择其他div,python,html,css,web-scraping,beautifulsoup,Python,Html,Css,Web Scraping,Beautifulsoup,这是一个棘手的问题，从我的角度来说，我被困在拉网部分，无法继续下去我只需要循环中的网格单元答案我试着用 grid\u cell=driver。通过css\u选择器（“标签浏览器>div:nth child（2）>div.mt-auto.grid.jc-space-between.fs-caption.fc-black-300>div:nth child（1）”查找元素。现在显示标签文本将显示2061748个问题 grid\u cell.text 但这只适用于一个元素如果我想让它在一个循

这是一个棘手的问题，从我的角度来说，我被困在拉网部分，无法继续下去

我只需要循环中的网格单元答案

我试着用

grid\u cell=driver。通过css\u选择器（“标签浏览器>div:nth child（2）>div.mt-auto.grid.jc-space-between.fs-caption.fc-black-300>div:nth child（1）”查找元素。

现在显示标签文本将显示2061748个问题

grid\u cell.text

但这只适用于一个元素

如果我想让它在一个循环中，我需要该页面中所有可用标记的所有计数，该怎么办

在本例中，如图所示，我在“javascript”和“java”上迭代了一个for循环但是使用_css _选择器获取_元素_将给出java或javascript的特定计数，但不能同时给出两者的计数

如果我选择的话

tag\u counts=body.find\u all（'div'，class='grid\u cell'）

然后我会得到其他类，也就是在所附图片中网格单元下面的类，它们将被排除在外

请提出一些解决办法。任何帮助都将不胜感激。

实现这一点有两种方法：

第一个选项： 去掉你不想刮的标签，然后刮去你想要的标签。例如：

tags = body.find_all('div', class_='grid_cell s-anchor') # TODO: add full class name (to remove this tag) 
for tag in tags:
    tag.extract() # Remove tag from body

tags = body.find_all('div', class_='grid_cell') # This will contain all the tags you want.

第二选项： 循环遍历父html标记，并使用find（）获取第一个标记。对于exmaple：

containers = body.find_all('div', class_='mt-auto grid') # Find parent tag 
for container in containers:
    tag = container.find('div', class_='grid_cell') # Get first tag in the container div
    print(tag.text.strip())

实现这一目标有两种方法：

第一个选项： 去掉你不想刮的标签，然后刮去你想要的标签。例如：

tags = body.find_all('div', class_='grid_cell s-anchor') # TODO: add full class name (to remove this tag) 
for tag in tags:
    tag.extract() # Remove tag from body

tags = body.find_all('div', class_='grid_cell') # This will contain all the tags you want.

第二选项： 循环遍历父html标记，并使用find（）获取第一个标记。对于exmaple：

containers = body.find_all('div', class_='mt-auto grid') # Find parent tag 
for container in containers:
    tag = container.find('div', class_='grid_cell') # Get first tag in the container div
    print(tag.text.strip())

发布网站url和您想要提取的内容发布网站url和您想要提取的内容感谢您的及时回复，看起来不错。我将尝试一下，并告诉您我尝试过使用以下方法：

#现在获取每个标记containers=body中所有问题的计数。find_all（'div'，class='s-card js tag cell grid fd column'）#查找容器中容器的父标记：tag=container。find（'div'，class='grid jc space between ai center mb12'）tag u count=container.find（'div'，class='mt-auto-grid jc fs caption fc-black-300'之间的空格）#获取容器div中的第一个标记打印（tag.text.strip（））打印（tag_count.text.strip（））

但它给了我错误的数据：谢谢你的及时回复，看起来不错。我会尝试一下并告诉你我试过使用这个：

#现在获取每个标记containers=body中所有问题的计数。find_all（'div'，class='s-card js tag cell grid fd column'）#查找containers中容器的父标记：tag=container.find（'div'，class='mt-auto-grid-jc space-between-ai center mb12'）tag_count=container.find（'div'，class='mt-auto-grid-jc space-between-fs caption fc-black-300'）#获取container div print（tag.text.strip（））print（tag_count.text.strip（））中的第一个标记

，但它给了我错误的数据：