Web scraping 当父母相似但不相同时,如何擦肩而过

Web scraping 当父母相似但不相同时,如何擦肩而过,web-scraping,Web Scraping,如果父母的名字不一样,你会如何删除这个网站的标题和链接 例如,从屏幕截图中可以看到,第一个标题和链接位于div class=“slot type post type-order-1”中。对于第二个标题和链接,它们位于div class=“slot type post type-order-2”内部,依此类推 该网站是 如果没有解决方案,我会有一个很长的代码,它似乎没有意义,像这样: content1 = soup.find_all('div', class_='slot type-post t

如果父母的名字不一样,你会如何删除这个网站的标题和链接

例如,从屏幕截图中可以看到,第一个标题和链接位于div class=“slot type post type-order-1”中。对于第二个标题和链接,它们位于div class=“slot type post type-order-2”内部,依此类推

该网站是

如果没有解决方案,我会有一个很长的代码,它似乎没有意义,像这样:

content1 = soup.find_all('div', class_='slot type-post type-order-1')
content2 = soup.find_all('div', class_='slot type-post type-order-2')

for contents in content1:
    title1 = contents.find('h3', class_='post-title entry-title card-title').text
    link1 = contents.h3.a['href']
    print(title1)
    print(link1)

for content in content2:
    title2 = content.find('h3', class_='post-title entry-title card-title').text
    link2 = content.h3.a['href']
    print(title2)
    print(link2)

您可以使用
select
方法使用css选择器

soup.select('div[class*="slot type-post type-order-"]')
*=
代表
包含

参考:

代码:

输出:

GAPs can help keep you warm through this winter freeze (45 Photos)
https://thechive.com/2021/02/15/gaps-can-help-keep-you-warm-through-this-winter-freeze/
Texans REALLY do not know how to handle a little snow (20 Photos)
https://thechive.com/2021/02/15/texans-really-do-not-know-how-to-handle-a-little-snow-20-photos/
...

非常感谢你!!成功了。我很好奇,当你刮的时候,你总是需要使用直接的父母来提取标题和网站吗?例如,我们是否可以使用div class='cards-content',它是包含'div[class*=“插槽类型post-type order-“]”的较大父级?
GAPs can help keep you warm through this winter freeze (45 Photos)
https://thechive.com/2021/02/15/gaps-can-help-keep-you-warm-through-this-winter-freeze/
Texans REALLY do not know how to handle a little snow (20 Photos)
https://thechive.com/2021/02/15/texans-really-do-not-know-how-to-handle-a-little-snow-20-photos/
...