Python 对请求使用线程时，内存使用率会增加_Python_Multithreading_Memory Leaks_Python Requests

Python 对请求使用线程时，内存使用率会增加

python multithreading memory-leaks

Python 对请求使用线程时，内存使用率会增加,python,multithreading,memory-leaks,python-requests,Python,Multithreading,Memory Leaks,Python Requests,我已经花了很多天的时间试图弄清楚为什么当我开始使用线程时，我的内存开始增加，而线程在给定的页面上执行一个请求。我认为使用线程可能会占用一些内存，但一旦线程被“终止”，它应该减少/删除使用的内存，因为线程被终止我在这里创建了一个小脚本，向大家展示我所做的工作： import threading import time import requests from bs4 import BeautifulSoup as soup def scrape(): response = requ

我已经花了很多天的时间试图弄清楚为什么当我开始使用线程时，我的内存开始增加，而线程在给定的页面上执行一个请求。我认为使用线程可能会占用一些内存，但一旦线程被“终止”，它应该减少/删除使用的内存，因为线程被终止

我在这里创建了一个小脚本，向大家展示我所做的工作：

import threading
import time

import requests
from bs4 import BeautifulSoup as soup


def scrape():
    response = requests.get('https://www.aftonbladet.se/')

    bs4 = soup(response.text, 'lxml')

    urls = ['https://aftonbladet.se{}'.format(raw_article.get('href')) for raw_product in
            bs4.find_all('a', {'class': 'HLf1C'})]

    return urls


def main():
    articleLists = scrape()

    while True:

        newArticleLists = scrape()

        for oneURL in newArticleLists:
            if oneURL not in articleLists:
                articleLists.append(oneURL)

                print(f"Found: {oneURL}")

                threading.Thread(
                    target=threadingPage,
                    args=(oneURL,
                          )
                ).start()

        else:
            print(f"[Total: {len(articleLists)}][Found: {len(newArticleLists)}]")
            time.sleep(2)
            continue


def threadingPage(url):
    response = requests.get(url)

    bs4 = soup(response.text, 'lxml')

    payload = {
        "title": None,
        "author": None
    }

    try:
        payload["title"] = bs4.find('h1', {'data-test-id': 'headline'}).text.strip()
    except:
        pass

    try:
        payload["author"] = bs4.find('a', {'data-test-id': 'author-link'}).text.strip()
    except:
        pass

    print(payload)
    return


if __name__ == '__main__':
    main()

我的问题是，如果一个URL不在ArticleList中，是真的，那么我们将在第行找到一个新的URL。一旦我们找到一篇新文章，我们就开始一个新的线程，它会刮取找到的新URL，以获取文章的标题和作者，然后我会做一个简单的返回，这会终止线程（我想）

然而，我的问题似乎是，每次我进入

def threadingPage（url）：

在整个刮片事件之后，似乎都会增加内存使用量，有时会增加2MB，我们找到的url越多，时间越长，更多的内存使用将被使用，我认为这是不正确的，因为我们确实在刮片之后终止线程，然后应该返回到启动线程之前的内存使用情况

我可能做错了什么，或者我没有完全正确地理解线程，但我会感谢你们的帮助：）