Python 我该如何添加多线程处理？_Python_Multithreading_Web Scraping

Python 我该如何添加多线程处理？

python multithreading web-scraping

Python 我该如何添加多线程处理？,python,multithreading,web-scraping,Python,Multithreading,Web Scraping,我不知道如何在网上搜刮那么多，我写了这段代码，但它运行得非常慢，这段代码用于从谷歌浏览器查询中获取搜索结果。我想尝试添加多线程，但我真的不知道如何添加。有人能告诉我如何使用多线程吗？还有，我应该使用哪个函数来处理多线程 import urllib import requests from bs4 import BeautifulSoup from multiprocessing import Pool # desktop user-agent def get_listing(url):

我不知道如何在网上搜刮那么多，我写了这段代码，但它运行得非常慢，这段代码用于从谷歌浏览器查询中获取搜索结果。我想尝试添加多线程，但我真的不知道如何添加。有人能告诉我如何使用多线程吗？还有，我应该使用哪个函数来处理多线程

import urllib
import requests
from bs4 import BeautifulSoup
from multiprocessing import Pool
# desktop user-agent

def get_listing(url):
    headers = {
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
    html = None
    links = None

    r = requests.get(url, headers=headers, timeout=10)

    if r.status_code == 200:
        html = r.text
        soup = BeautifulSoup(html, 'lxml')
        listing_section = soup.select('#offers_table table > tbody > tr > td > h3 > a')
        links = [link['href'].strip() for link in listing_section]
    return links

def scrapeLinks(query_string):
    USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"

    query = query_string
    query = query.replace(' ', '+')
    URL = f"https://google.com/search?q={query}"

    headers = {"user-agent": USER_AGENT}
    resp = requests.get(URL, headers=headers)

    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
        for g in soup.find_all('div', class_='r'):
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                title = g.find('h3').text
                item = {
                    "title": title,
                    "link": link
                }
                results.append(item)
        return results

def getFirst5Results(query_string):
    list = scrapeLinks(query_string)
    return [list[0]["link"], list[1]["link"], list[2]["link"], list[3]["link"], list[4]["link"]]

关于多线程的几件事

您可以将其用于需要网络调用的代码。对于实例，调用api
当代码运行更长时间时持续时间，并且您希望在后台运行该进程
如果您已经说明了web抓取是一项长期运行的任务，因为它涉及到对谷歌api的网络调用和对结果的解析在我们得到结果之后。假设您正在使用
```
scrapeLinks
```
用于刮取的功能。下面是一些代码：

导入线程
t1=threading.Thread（target=scrapeLinks，args=（查询字符串，）
t1.start（）

要从线程检索结果，请使用：

t1.join（）