Python 如何让beautifulsoup更快地获取网页中的所有元素?
如何使“美丽之路”刮刀更快? 这段代码看起来很慢,有什么方法可以加快速度吗Python 如何让beautifulsoup更快地获取网页中的所有元素?,python,python-3.x,beautifulsoup,request,lxml,Python,Python 3.x,Beautifulsoup,Request,Lxml,如何使“美丽之路”刮刀更快? 这段代码看起来很慢,有什么方法可以加快速度吗 def getNews(): tic=time.perf_counter() requests_session = requests.Session() scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content product = SoupStraine
def getNews():
tic=time.perf_counter()
requests_session = requests.Session()
scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
product = SoupStrainer('div', {'id': 'collection__elements j-scrollElement'})
soup = BeautifulSoup(scrapy, 'lxml')
for div in soup.findAll('div', attrs={'class': 'collection__elements j-scrollElement'}):
for div in div.findAll('div', attrs={'class':'article__content'}):
for div2 in div.find_all('h3', attrs={'class':'article__headline'}):
for a in div2.find_all('a', href=True):
if a.text:
print(a.text)
print(a['href'])
toc=time.perf_counter()
print(toc-tic)
除非有更多的故事,否则你的代码和下面的选项一样快,但我的代码会找到更多的文章和故事。这对你来说可能重要,也可能不重要。我用现代的Windows笔记本电脑上网 你看到什么时候让你觉得这不快?或者你认为应该是什么?它以1/3秒的速度运行
%%timeit
requests_session = requests.Session()
scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
soup = BeautifulSoup(scrapy, 'lxml')
for div in soup.findAll('div', attrs={'class': 'collection__elements j-scrollElement'}):
for div in div.findAll('div', attrs={'class':'article__content'}):
for div2 in div.find_all('h3', attrs={'class':'article__headline'}):
for a in div2.find_all('a', href=True):
if a.text:
# print(a.text)
print(a['href'])
# 318 ms ± 55.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
requests_session = requests.Session()
scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
soup = BeautifulSoup(scrapy, 'lxml')
for link in soup.find_all('a', class_='link', href=re.compile('articles|story')):
print(link.get('href'))
# 317 ms ± 58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
“这个代码看起来很慢”。但是是吗?请定义“慢”。执行时间太长定义“太长”好的,谢谢,我需要1.6秒。不管怎样,您知道如何在破折号中的Html.table中插入所有结果吗?迭代结果?不客气。请提出一个新问题,有经验的人可以回答。尝试一下,粘贴你的代码。