Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何让beautifulsoup更快地获取网页中的所有元素?_Python_Python 3.x_Beautifulsoup_Request_Lxml - Fatal编程技术网

Python 如何让beautifulsoup更快地获取网页中的所有元素?

Python 如何让beautifulsoup更快地获取网页中的所有元素?,python,python-3.x,beautifulsoup,request,lxml,Python,Python 3.x,Beautifulsoup,Request,Lxml,如何使“美丽之路”刮刀更快? 这段代码看起来很慢,有什么方法可以加快速度吗 def getNews(): tic=time.perf_counter() requests_session = requests.Session() scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content product = SoupStraine

如何使“美丽之路”刮刀更快? 这段代码看起来很慢,有什么方法可以加快速度吗

def getNews():
        tic=time.perf_counter()
        requests_session = requests.Session()
        scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
        product = SoupStrainer('div', {'id': 'collection__elements j-scrollElement'})
        soup = BeautifulSoup(scrapy, 'lxml')
        for div in soup.findAll('div', attrs={'class': 'collection__elements j-scrollElement'}):
            for div in div.findAll('div', attrs={'class':'article__content'}):
                for div2 in div.find_all('h3', attrs={'class':'article__headline'}):
                     for a in div2.find_all('a', href=True):
                         if a.text:
                            print(a.text)
                            print(a['href'])
        toc=time.perf_counter()
        print(toc-tic)

除非有更多的故事,否则你的代码和下面的选项一样快,但我的代码会找到更多的文章和故事。这对你来说可能重要,也可能不重要。我用现代的Windows笔记本电脑上网

你看到什么时候让你觉得这不快?或者你认为应该是什么?它以1/3秒的速度运行

    %%timeit
    requests_session = requests.Session()
    scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
    soup = BeautifulSoup(scrapy, 'lxml')
    for div in soup.findAll('div', attrs={'class': 'collection__elements j-scrollElement'}):
        for div in div.findAll('div', attrs={'class':'article__content'}):
            for div2 in div.find_all('h3', attrs={'class':'article__headline'}):
                 for a in div2.find_all('a', href=True):
                     if a.text:
                        # print(a.text)
                        print(a['href'])

# 318 ms ± 55.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
requests_session = requests.Session()
scrapy = requests.get('https://www.marketwatch.com/markets?mod=top_nav ').content
soup = BeautifulSoup(scrapy, 'lxml')
for link in soup.find_all('a', class_='link', href=re.compile('articles|story')):
    print(link.get('href'))

# 317 ms ± 58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

“这个代码看起来很慢”。但是是吗?请定义“慢”。执行时间太长定义“太长”好的,谢谢,我需要1.6秒。不管怎样,您知道如何在破折号中的Html.table中插入所有结果吗?迭代结果?不客气。请提出一个新问题,有经验的人可以回答。尝试一下,粘贴你的代码。