Python 如何让beautifulsoup更快地获取网页中的所有元素？_Python_Python 3.x_Beautifulsoup_Request_Lxml

Python 如何让beautifulsoup更快地获取网页中的所有元素？

python python-3.x

Python 如何让beautifulsoup更快地获取网页中的所有元素？,python,python-3.x,beautifulsoup,request,lxml,Python,Python 3.x,Beautifulsoup,Request,Lxml,如何使“美丽之路”刮刀更快？这段代码看起来很慢，有什么方法可以加快速度吗 def getNews(): tic=time.perf_counter() requests_session = requests.Session() scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content product = SoupStraine

如何使“美丽之路”刮刀更快？这段代码看起来很慢，有什么方法可以加快速度吗

def getNews():
        tic=time.perf_counter()
        requests_session = requests.Session()
        scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
        product = SoupStrainer('div', {'id': 'collection__elements j-scrollElement'})
        soup = BeautifulSoup(scrapy, 'lxml')
        for div in soup.findAll('div', attrs={'class': 'collection__elements j-scrollElement'}):
            for div in div.findAll('div', attrs={'class':'article__content'}):
                for div2 in div.find_all('h3', attrs={'class':'article__headline'}):
                     for a in div2.find_all('a', href=True):
                         if a.text:
                            print(a.text)
                            print(a['href'])
        toc=time.perf_counter()
        print(toc-tic)

除非有更多的故事，否则你的代码和下面的选项一样快，但我的代码会找到更多的文章和故事。这对你来说可能重要，也可能不重要。我用现代的Windows笔记本电脑上网

你看到什么时候让你觉得这不快？或者你认为应该是什么？它以1/3秒的速度运行

    %%timeit
    requests_session = requests.Session()
    scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
    soup = BeautifulSoup(scrapy, 'lxml')
    for div in soup.findAll('div', attrs={'class': 'collection__elements j-scrollElement'}):
        for div in div.findAll('div', attrs={'class':'article__content'}):
            for div2 in div.find_all('h3', attrs={'class':'article__headline'}):
                 for a in div2.find_all('a', href=True):
                     if a.text:
                        # print(a.text)
                        print(a['href'])

# 318 ms ± 55.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
requests_session = requests.Session()
scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
soup = BeautifulSoup(scrapy, 'lxml')
for link in soup.find_all('a', class_='link', href=re.compile('articles|story')):
    print(link.get('href'))

# 317 ms ± 58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

“这个代码看起来很慢”。但是是吗？请定义“慢”。执行时间太长定义“太长”好的，谢谢，我需要1.6秒。不管怎样，您知道如何在破折号中的Html.table中插入所有结果吗？迭代结果？不客气。请提出一个新问题，有经验的人可以回答。尝试一下，粘贴你的代码。