在Python中动态加载线程类_Python_Multithreading

在Python中动态加载线程类

python multithreading

在Python中动态加载线程类,python,multithreading,Python,Multithreading,我发布了一个问题，我已经解决了我的新问题与最后的代码有关，该代码遍历目录中的模块并动态加载它们： modules = pkgutil.iter_modules(path=[os.path.join(path,'scrapers')]) for loader, mod_name, ispkg in modules: # Ensure that module isn't already loaded, and that it isn't the parent class if (m

我发布了一个问题，我已经解决了

我的新问题与最后的代码有关，该代码遍历目录中的模块并动态加载它们：

modules = pkgutil.iter_modules(path=[os.path.join(path,'scrapers')])
for loader, mod_name, ispkg in modules:
    # Ensure that module isn't already loaded, and that it isn't the parent class
    if (mod_name not in sys.modules) and (mod_name != "Scrape_BASE"):
        # Import module
        loaded_mod = __import__('scrapers.'+mod_name, fromlist=[mod_name])
        # Load class from imported module. Make sure the module and the class are named the same
        class_name = mod_name
        loaded_class = getattr(loaded_mod, class_name)
        # only instantiate subclasses of Scrape_BASE
        if(issubclass(loaded_class,Scrape_BASE.Scrape_BASE)): 
            # Create an instance of the class and run it
            instance = loaded_class()
            instance.start()
            instance.join()
            text = instance.GetText()

在大多数课程中，我都是从网站上阅读PDF文件，删除内容并设置GetText（）随后返回的文本

在某些情况下，PDF太大，最终导致分割错误。有没有办法监视线程，使其在3分钟左右后超时？有人对我如何实现这一点有什么建议吗？

正确的方法是更改那些您没有向我们展示的类中的代码，这样它们就不会永远运行。如果可能的话，你绝对应该这么做。如果你想暂停的是“从网站上阅读PDF”，那么几乎可以肯定这是可能的

但有时，这是不可能的；有时你只是，例如，调用一些没有超时的C函数。那你怎么办

嗯，线程不能被中断。因此，您需要改用流程。非常类似于

threading.Thread

，只是它在子进程中运行代码，而不是在同一进程中运行线程

这确实意味着您不能在不明确的情况下与员工共享任何全局数据，但这通常是一件好事。但是，这确实意味着输入数据（在本例中似乎什么都不是）和输出数据（似乎是一个大字符串）必须是可拾取的，并显式地通过队列传递。这很容易做到；请阅读本节了解详细信息

当我们在这里时，你可能会考虑重新考虑你的设计来考虑任务而不是线程。如果你有，比如说，200个PDF要下载，你并不真的想要200个线程；您可能需要8或12个线程，所有线程都为200个作业队列提供服务。

多处理

模块支持进程池，但您可能会发现更适合此功能。

multiprocessing.Pool

和

concurrent.futures.ProcessPoolExecutor

都允许您只传递一个函数和一些参数，然后等待结果，而不必担心调度、队列或其他任何问题。

这里有很多好主意。我会调查一下他们是否能帮我，