Web scraping 当父母相似但不相同时,如何擦肩而过
如果父母的名字不一样,你会如何删除这个网站的标题和链接 例如,从屏幕截图中可以看到,第一个标题和链接位于div class=“slot type post type-order-1”中。对于第二个标题和链接,它们位于div class=“slot type post type-order-2”内部,依此类推 该网站是 如果没有解决方案,我会有一个很长的代码,它似乎没有意义,像这样:Web scraping 当父母相似但不相同时,如何擦肩而过,web-scraping,Web Scraping,如果父母的名字不一样,你会如何删除这个网站的标题和链接 例如,从屏幕截图中可以看到,第一个标题和链接位于div class=“slot type post type-order-1”中。对于第二个标题和链接,它们位于div class=“slot type post type-order-2”内部,依此类推 该网站是 如果没有解决方案,我会有一个很长的代码,它似乎没有意义,像这样: content1 = soup.find_all('div', class_='slot type-post t
content1 = soup.find_all('div', class_='slot type-post type-order-1')
content2 = soup.find_all('div', class_='slot type-post type-order-2')
for contents in content1:
title1 = contents.find('h3', class_='post-title entry-title card-title').text
link1 = contents.h3.a['href']
print(title1)
print(link1)
for content in content2:
title2 = content.find('h3', class_='post-title entry-title card-title').text
link2 = content.h3.a['href']
print(title2)
print(link2)
您可以使用
select
方法使用css选择器
soup.select('div[class*="slot type-post type-order-"]')
*=
代表包含
参考:
代码:
输出:
GAPs can help keep you warm through this winter freeze (45 Photos)
https://thechive.com/2021/02/15/gaps-can-help-keep-you-warm-through-this-winter-freeze/
Texans REALLY do not know how to handle a little snow (20 Photos)
https://thechive.com/2021/02/15/texans-really-do-not-know-how-to-handle-a-little-snow-20-photos/
...
非常感谢你!!成功了。我很好奇,当你刮的时候,你总是需要使用直接的父母来提取标题和网站吗?例如,我们是否可以使用div class='cards-content',它是包含'div[class*=“插槽类型post-type order-“]”的较大父级?
GAPs can help keep you warm through this winter freeze (45 Photos)
https://thechive.com/2021/02/15/gaps-can-help-keep-you-warm-through-this-winter-freeze/
Texans REALLY do not know how to handle a little snow (20 Photos)
https://thechive.com/2021/02/15/texans-really-do-not-know-how-to-handle-a-little-snow-20-photos/
...