Python 无法让脚本解析出现在特定文本之后的其余结果
我正在尝试用python创建一个脚本,在满足特定条件时,从网页中刮取不同帖子的标题和链接。我希望脚本打印在特定文本之后可用的其余结果,就像本例中的Chromedriver一样。但是,我当前的尝试仅打印此文本以替代Chromedriver 如何让脚本解析出现在特定文本之后的其余结果?因为您有一个ì语句:如果标题不是Chromedriver的替代品,那么继续循环下一个标题,这就是为什么它只在标题=='Chromedriver的替代品'时打印出来。要打印所有标题,请按如下方式更改代码Python 无法让脚本解析出现在特定文本之后的其余结果,python,python-3.x,web-scraping,Python,Python 3.x,Web Scraping,我正在尝试用python创建一个脚本,在满足特定条件时,从网页中刮取不同帖子的标题和链接。我希望脚本打印在特定文本之后可用的其余结果,就像本例中的Chromedriver一样。但是,我当前的尝试仅打印此文本以替代Chromedriver 如何让脚本解析出现在特定文本之后的其余结果?因为您有一个ì语句:如果标题不是Chromedriver的替代品,那么继续循环下一个标题,这就是为什么它只在标题=='Chromedriver的替代品'时打印出来。要打印所有标题,请按如下方式更改代码 for item
for item in soup.select(".summary .question-hyperlink"):
title = item.get_text(strip=True)
link = item.get("href")
if check_title == title:
print(f'Found it: {title}, {link}')
else:
print(title,link)
#Web scrapting of image with Python /questions/61035199/web-scrapting-of-image-with-python
#Imported functions not working in puppeteer /questions/61035043/imported-functions-not-working-in-puppeteer
#Trying to build a webscraper following a tutorial and keep getting attribute error for findall /questions/61034690/trying-to-build-a-webscraper-following-a-tutorial-and-keep-getting-attribute-err
#Python Selenium Web Scraping Hidden Div /questions/61034439/python-selenium-web-scraping-hidden-div
#Found it: Alternative to Chromedriver, /questions/61034224/alternative-to-chromedriver
看看上面的输出,最后一个标题euqal为“Chromedriver的替代品”,所以它打印出“找到了”,其他人只打印标题,链接试试:
import requests
from bs4 import BeautifulSoup
URL = "https://stackoverflow.com/questions/tagged/web-scraping?tab=Newest"
check_title = "Alternative to Chromedriver"
res = requests.get(URL)
soup = BeautifulSoup(res.text,'html.parser')
# Initialise a flag to track where to start printing from
start_printing = False
for item in soup.select(".summary .question-hyperlink"):
title = item.get_text(strip=True)
# Keep iterating until the required text is found. Initialise it only once
if not start_printing and check_title == title:
start_printing = True
continue
if start_printing:
link = item.get("href")
print(title,link)
import requests
from bs4 import BeautifulSoup
URL = "https://stackoverflow.com/questions/tagged/web-scraping?tab=Newest"
check_title = "Alternative to Chromedriver"
res = requests.get(URL)
soup = BeautifulSoup(res.text,'html.parser')
# Initialise a flag to track where to start printing from
start_printing = False
for item in soup.select(".summary .question-hyperlink"):
title = item.get_text(strip=True)
# Keep iterating until the required text is found. Initialise it only once
if not start_printing and check_title == title:
start_printing = True
continue
if start_printing:
link = item.get("href")
print(title,link)