无法从python多处理脚本中查看内部调试打印语句_Python_Multiprocessing

无法从python多处理脚本中查看内部调试打印语句

python

无法从python多处理脚本中查看内部调试打印语句,python,multiprocessing,Python,Multiprocessing,在下面的代码中，我试图从许多URL（本例中提供的虚拟URL）打印内容，但没有看到任何打印语句。这使得调试我的程序变得困难。有解决办法吗我已经尝试过在python中设置日志级别和启用详细模式，但没有多大用处——我可以看到子流程日志，但这些都不是有用的 import datetime import urllib.request from urllib.request import Request, urlopen import os import contextlib import seleniu

在下面的代码中，我试图从许多URL（本例中提供的虚拟URL）打印内容，但没有看到任何打印语句。这使得调试我的程序变得困难。有解决办法吗

我已经尝试过在python中设置日志级别和启用详细模式，但没有多大用处——我可以看到子流程日志，但这些都不是有用的

import datetime
import urllib.request
from urllib.request import Request, urlopen
import os
import contextlib
import selenium.webdriver as webdriver
import lxml.html as LH
import lxml.html.clean as clean
import time
from datetime import timedelta
import threading
from multiprocessing import Pool, cpu_count
import logging
import multiprocessing
import sys

start_time = time.time()
#get all companies from edinet
ignore_tags=('script','noscript','style')

urls = ["https://www.webscraper.io/test-sites/e-commerce/allinone/computers","https://www.webscraper.io/test-sites/e-commerce/allinone/computers/laptops","https://www.webscraper.io/test-sites/e-commerce/allinone/phones/touch"]


allind = list(range(0,len(urls)))

def get_links(inds):
    for ind in inds:
        try:
            l= urls[ind]
            print(l)
            l = l.replace("\n","")
            options = webdriver.ChromeOptions()
            options.add_argument("headless")
            options.add_argument("--no-sandbox")
            driver =  webdriver.Chrome(executable_path="/PATH/chromedriver", chrome_options=options)
            driver.get(l)
            content=driver.page_source
            cleaner=clean.Cleaner()
            content=cleaner.clean_html(content)
            print(content)

        except Exception as e:
            sys.stdout.flush()
            print(e)
        driver.quit()


pool = Pool()
mpl = multiprocessing.log_to_stderr()
mpl.setLevel(multiprocessing.SUBDEBUG)

ITERATION_COUNT = cpu_count()-1
print(ITERATION_COUNT)
count_per_iteration = len(allind) / float(ITERATION_COUNT)
for i in range(0, ITERATION_COUNT):
    print(i)
    list_start = int(count_per_iteration * i)
    list_end = int(count_per_iteration * (i+1))
    pool.apply_async(get_links, [allind[list_start:list_end]])


elapsed_time_secs = time.time() - start_time

msg = "Execution took: %s secs (Wall clock time)" % timedelta(seconds=round(elapsed_time_secs))
print(msg)

编辑：

在下面的注释的帮助下，经过进一步的研究，我将代码的多处理部分更改为下面的代码，并观察到预期的行为

if __name__ == '__main__':
    start_time = time.time()
    with Pool(cpu_count()-1) as p:
        p.starmap(get_links, zip(range(1, 400)))
    p.close()
    p.join()

您甚至看不到

打印（迭代计数）

的输出？在多处理中打印有它的挑战。使用

mpl.（消息）

而不是

print（）

。其中

是

信息

，

错误

，

调试

。。。您可能希望更早地定义

mpl

。谢谢，但我仍然没有看到driver=webdriver.Chrome之后的日志语句。。。所以我想知道我还需要做什么。我确实在脚本的开头定义了mpl。您使用的是

apply\u async

，而没有对返回的

AsyncResult

对象执行适当的阻塞调用，并且您也没有执行其他操作来阻止父对象在启动作业后退出。您可能只想使用诸如

pool.map

之类的阻塞方法，而不是

pool.apply\u async

。您甚至看不到

打印（迭代计数）

的输出？在多处理中打印有它的挑战。使用

mpl.（消息）

而不是

print（）

。其中

是

信息

，

错误

，

调试

。。。您可能希望更早地定义

mpl

。谢谢，但我仍然没有看到driver=webdriver.Chrome之后的日志语句。。。所以我想知道我还需要做什么。我确实在脚本的开头定义了mpl。您使用的是

apply\u async

，而没有对返回的

AsyncResult

对象执行适当的阻塞调用，并且您也没有执行其他操作来阻止父对象在启动作业后退出。您可能只想使用像

pool.map

这样的阻塞方法，而不是

pool.apply\u async

。