Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sqlite/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/visual-studio-2012/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 多处理线程池在结束时停止_Python_Multiprocessing_Threadpool - Fatal编程技术网

Python 多处理线程池在结束时停止

Python 多处理线程池在结束时停止,python,multiprocessing,threadpool,Python,Multiprocessing,Threadpool,我编写了一个脚本,可以“解析”文件中的所有域。发布后,一切正常。但当最后剩下几个域时,它就会被卡住。有时解析最后两个域需要很长时间。我想不出是什么问题。谁曾面对过这样的局面?告诉我怎么治好它 在发布之后,一切都会很快(正如它应该的那样)完成,直到最后。最后,当剩下几个域时,它停止。没有差异,1000个域或10000个域 完整代码: import re import sys import json import requests from bs4 import BeautifulSoup from

我编写了一个脚本,可以“解析”文件中的所有域。发布后,一切正常。但当最后剩下几个域时,它就会被卡住。有时解析最后两个域需要很长时间。我想不出是什么问题。谁曾面对过这样的局面?告诉我怎么治好它

在发布之后,一切都会很快(正如它应该的那样)完成,直到最后。最后,当剩下几个域时,它停止。没有差异,1000个域或10000个

完整代码:

import re
import sys
import json
import requests
from bs4 import BeautifulSoup
from multiprocessing.pool import ThreadPool
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

pool = 100

with open("Rules.json") as file:
    REGEX = json.loads(file.read())

ua = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0'}

def Domain_checker(domain):
    try:
        r = requests.get("http://" + domain, verify=False, headers=ua)
        r.encoding = "utf-8"

        for company in REGEX.keys():

            for type in REGEX[company]:
                check_entry = 0

                for ph_regex in REGEX[company][type]:
                    if bool(re.search(ph_regex, r.text)) is True:
                        check_entry += 1

                        if check_entry == len(REGEX[company][type]):
                            title = BeautifulSoup(r.text, "lxml")
                            Found_domain = "\nCompany: {0}\nRule: {1}\nURL: {2}\nTitle: {3}\n".format(company, type, r.url, title.title.text)
                            print(Found_domain)
                            with open("/tmp/__FOUND_DOMAINS__.txt", "a", encoding='utf-8', errors = 'ignore') as file:
                                file.write(Found_domain)

    except requests.exceptions.ConnectionError:
        pass
    except requests.exceptions.TooManyRedirects:
        pass
    except requests.exceptions.InvalidSchema:
        pass
    except requests.exceptions.InvalidURL:
        pass
    except UnicodeError:
        pass
    except requests.exceptions.ChunkedEncodingError:
        pass
    except requests.exceptions.ContentDecodingError:
        pass
    except AttributeError:
        pass
    except ValueError:
        pass

    return domain


if __name__ == '__main__':

    with open(sys.argv[1], "r", encoding='utf-8', errors = 'ignore') as file:
        Domains = file.read().split()

    pool = 100
    print("Pool = ", pool)

    results = ThreadPool(pool).imap_unordered(Domain_checker, Domains)
    string_num = 0

    for result in results:
        print("{0} => {1}".format(string_num, result))
        string_num += 1

    with open("/tmp/__FOUND_DOMAINS__.txt", encoding='utf-8', errors = 'ignore') as found_domains:
        found_domains = found_domains.read()

    print("{0}\n{1}".format("#" * 40, found_domains))
安装超时后,问题得到解决


感谢昵称为“eri”():)的用户。

其中一个块可能正在抑制引发的异常。至少打印异常。异常中没有由于线程池而导致的错误。只有与域、编码等不可用性相关的异常才是LIGAT0R:因为您正在抑制所有这些异常,所以您不知道是什么导致了它们。顺便说一句,您可以在一个
except
子句中以相同的方式处理多个异常,方法是创建它们的元组:即
except(requests.Exceptions.ConnectionError、requests.Exceptions.TooManyRedirects等):
requests.get("http://" + domain, headers=ua, verify=False, timeout=10)