Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/348.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/multithreading/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/image-processing/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何正确调试线程池?_Python_Multithreading_Multiprocessing_Threadpool - Fatal编程技术网

Python 如何正确调试线程池?

Python 如何正确调试线程池?,python,multithreading,multiprocessing,threadpool,Python,Multithreading,Multiprocessing,Threadpool,我想从网页上获取一些数据。为了加快这个过程(它们允许我每分钟发出1000个请求),我使用ThreadPool 由于有大量的数据,这个过程很容易出现连接失败等情况。所以我尝试记录所有我能记录的内容,以便能够检测到我在代码中犯的每一个错误 问题是,程序有时会毫无例外地停止(它的行为就像它正在运行一样,但没有任何效果——我使用PyCharm)。我可以在任何地方记录捕获到的异常,但在任何日志中都看不到任何异常 我假设如果达到超时,将引发并记录异常 我已经找到了问题所在。代码如下: 作为一个池,我使用:f

我想从网页上获取一些数据。为了加快这个过程(它们允许我每分钟发出1000个请求),我使用
ThreadPool

由于有大量的数据,这个过程很容易出现连接失败等情况。所以我尝试记录所有我能记录的内容,以便能够检测到我在代码中犯的每一个错误

问题是,程序有时会毫无例外地停止(它的行为就像它正在运行一样,但没有任何效果——我使用PyCharm)。我可以在任何地方记录捕获到的异常,但在任何日志中都看不到任何异常

我假设如果达到超时,将引发并记录异常

我已经找到了问题所在。代码如下:

作为一个池,我使用:
from multiprocessing.pool导入ThreadPool As pool
和锁:
从线程导入锁

循环中正在使用“下载”类别函数

    def download_category(url):
        # some code
        #
        # ...

        log('Create pool...')
        _pool = Pool(_workers_number)

        with open('database/temp_produkty.txt') as f:
            log('Spracovavanie produktov... vytvaranie vlakien...') # I see this in log
            for url_product in f:
                x = _pool.apply_async(process_product, args=(url_product.strip('\n'), url))
            _pool.close()
            _pool.join()

            log('Presuvanie produktov z temp export do export.csv...') # I can't see this in log
            temp_export_to_export_csv()
            set_spracovanie_kategorie(url)
    except Exception as e:
        logging.exception('Got exception on download_one_category: {}'.format(url))
工艺和产品功能:

def process_product(url, cat):
    try:
        data = get_product_data(url)
    except:
        log('{}: {} exception while getting product data... #') # I don't see this in log
        return
    try:
        print_to_temp_export(data, cat) # I don't see this in log
    except:
        log('{}: {} exception while printing to csv... #') # I don't see this in log
        raise
def log(text):
    now = datetime.now().strftime('%d.%m.%Y %H:%M:%S')
    _lock.acquire()
    mLib.printToFile('logging/log.log', '{} -> {}'.format(now, text))
    _lock.release()
日志功能:

def process_product(url, cat):
    try:
        data = get_product_data(url)
    except:
        log('{}: {} exception while getting product data... #') # I don't see this in log
        return
    try:
        print_to_temp_export(data, cat) # I don't see this in log
    except:
        log('{}: {} exception while printing to csv... #') # I don't see this in log
        raise
def log(text):
    now = datetime.now().strftime('%d.%m.%Y %H:%M:%S')
    _lock.acquire()
    mLib.printToFile('logging/log.log', '{} -> {}'.format(now, text))
    _lock.release()
我也使用
日志记录
模块。在这个日志中,我看到可能有8次(工人数量)请求被发送,但没有收到任何答复

编辑1:

def get_product_data(url):
    data = defaultdict(lambda: '-')

    root = load_root(url)
    try:
        nazov = root.xpath('//h1[@itemprop="name"]/text()')[0]
    except:
        nazov = root.xpath('//h1/text()')[0]

    under_block = root.xpath('//h2[@id="lowest-cost"]')

    if len(under_block) < 1:
        under_block = root.xpath('//h2[contains(text(),"Naj")]')
        if len(under_block) < 1:
            return False

    data['nazov'] = nazov
    data['url'] = url

    blocks = under_block[0].xpath('./following-sibling::div[@class="shp"]/div[contains(@class,"shp")]')

    i = 0

    for block in blocks:
        i += 1
        data['dat{}_men'.format(i)] = eblock.xpath('.//a[@class="link"]/text()')[0]

    del root
    return data

因此,很明显,get_产品_数据是悬而未决的问题所在-您能否展示代码或至少一个显示问题的可运行示例(包括您正在访问的URL)。也许它不是挂着的,而是在等待回应。你的代码对任何其他网站都有效吗?@barny谢谢你的评论,我已经将代码添加到问题中。那么它是否曾经通过请求。get()-它挂起的地方是哪里?你说这个网站应该每秒允许1000个请求,它能并行处理多少个请求,还是一次只能处理一个请求?网站是什么?我在想这个。我认为它会在几秒钟后引发TimeoutException,不是吗?假设您将线程减少到1,那么代码的运行速度应该会更慢。它有用吗?它以什么速率收集数据?