使用多处理在Python进程之间共享数据的问题

使用多处理在Python进程之间共享数据的问题,python,subprocess,python-multiprocessing,Python,Subprocess,Python Multiprocessing,我看过好几篇关于这方面的文章,所以我知道这是相当简单的,但我似乎做得不够。我不确定是否需要创建工作池或使用队列类。基本上,我希望能够创建几个进程,每个进程都可以自主地进行操作(这就是为什么它们从代理超类继承) 在主循环的随机滴答声中,我想更新每个代理。我使用time.sleep在主循环和代理的运行循环中使用不同的值来模拟不同的处理器速度 这是我的代理超类: # Generic class to handle mpc of each agent class Agent(mpc.Process):

我看过好几篇关于这方面的文章,所以我知道这是相当简单的,但我似乎做得不够。我不确定是否需要创建工作池或使用队列类。基本上,我希望能够创建几个进程,每个进程都可以自主地进行操作(这就是为什么它们从代理超类继承)

在主循环的随机滴答声中,我想更新每个代理。我使用
time.sleep
在主循环和代理的运行循环中使用不同的值来模拟不同的处理器速度

这是我的代理超类:

# Generic class to handle mpc of each agent
class Agent(mpc.Process):
  # initialize agent parameters
  def __init__(self,):
    # init mpc
    mpc.Process.__init__(self)
    self.exit = mpc.Event()

  # an agent's main loop...generally should be overridden
  def run(self):
    while not self.exit.is_set():
      pass
    print "You exited!"

  # safely shutdown an agent
  def shutdown(self):
    print "Shutdown initiated"
    self.exit.set()

  # safely communicate values to this agent
  def communicate(self,value):
    print value
特定代理的子类(模拟HVAC系统):

实际上,一旦出现测量值[68],内部存储值应更新为输出68(不是加热1、加热2等)。因此,暖通空调系统的自身温度没有得到适当的更新


编辑:经过一点研究,我意识到我不一定了解幕后发生的事情。每个子进程都使用自己的虚拟内存块进行操作,并且完全从以这种方式共享的任何数据中抽象出来,因此传入值是行不通的。我的新问题是,我不一定确定如何与多个流程共享全球价值

我正在查看Queue或JoinableQueue包,但我不一定确定如何将队列传递到我所拥有的超类设置类型中(特别是使用
mpc.Process.\uu_init\uuuu(self)
call)

另一个问题是,我是否可以让多个代理从队列中读取值,而不将其从队列中拉出?例如,如果我想与多个代理共享一个
温度
值,那么队列是否适用于此


假设您需要以下内容,这里有一个建议的解决方案:

  • 控制工人寿命的集中管理者/主流程
  • worker进程执行自包含的操作,然后向经理和其他进程报告结果
不过,在我展示之前,我想说的是,一般来说,除非您是CPU限制的
多处理
并不适合,这主要是因为增加了复杂性,而且您可能最好使用不同的高级异步框架。另外,你应该使用Python3,它好得多

这就是说,使用
多处理
可以很容易地实现这一点。我已经在Python3中完成了这项工作,但我认为在Python2中任何东西都不应该“只工作”,但我没有检查

from ctypes import c_bool
from multiprocessing import Manager, Process, Array, Value
from pprint import pprint
from time import sleep, time


class Agent(Process):

    def __init__(self, name, shared_dictionary, delay=0.5):
        """My take on your Agent.

        Key difference is that I've commonized the run-loop and used
        a shared value to signal when to stop, to demonstrate it.
        """
        super(Agent, self).__init__()
        self.name = name

        # This is going to be how we communicate between processes.
        self.shared_dictionary = shared_dictionary

        # Create a silo for us to use.
        shared_dictionary[name] = []
        self.should_stop = Value(c_bool, False)

        # Primarily for testing purposes, and for simulating 
        # slower agents.
        self.delay = delay

    def get_next_results(self):
        # In the real world I'd use abc.ABCMeta as the metaclass to do 
        # this properly.
        raise RuntimeError('Subclasses must implement this')

    def run(self):
        ii = 0
        while not self.should_stop.value:
            ii += 1
            # debugging / monitoring
            print('%s %s run loop execution %d' % (
                type(self).__name__, self.name, ii))

            next_results = self.get_next_results()

            # Add the results, along with a timestamp.
            self.shared_dictionary[self.name] += [(time(), next_results)]
            sleep(self.delay)

    def stop(self):
        self.should_stop.value = True
        print('%s %s stopped' % (type(self).__name__, self.name))


class HVACAgent(Agent):
    def get_next_results(self):
        # This is where you do your work, but for the sake of
        # the example just return a constant dictionary.
        return {'temperature': 5, 'pressure': 7, 'humidity': 9}


class DumbReadingAgent(Agent):
    """A dumb agent to demonstrate workers reading other worker values."""

    def get_next_results(self):
        # get hvac 1 results:
        hvac1_results = self.shared_dictionary.get('hvac 1')
        if hvac1_results is None:
            return None

        return hvac1_results[-1][1]['temperature']

# Script starts.
results = {}

# The "with" ensures we terminate the manager at the end.
with Manager() as manager:

    # the manager is a subprocess in its own right. We can ask
    # it to manage a dictionary (or other python types) for us
    # to be shared among the other children.
    shared_info = manager.dict()

    hvac_agent1 = HVACAgent('hvac 1', shared_info)
    hvac_agent2 = HVACAgent('hvac 2', shared_info, delay=0.1)
    dumb_agent = DumbReadingAgent('dumb hvac1 reader', shared_info)

    agents = (hvac_agent1, hvac_agent2, dumb_agent)

    list(map(lambda a: a.start(), agents))

    sleep(1)

    list(map(lambda a: a.stop(), agents))
    list(map(lambda a: a.join(), agents))

    # Not quite sure what happens to the shared dictionary after
    # the manager dies, so for safety make a local copy.
    results = dict(shared_info)

pprint(results)

下面是一个建议的解决方案,假设您需要以下内容:

  • 控制工人寿命的集中管理者/主流程
  • worker进程执行自包含的操作,然后向经理和其他进程报告结果
不过,在我展示之前,我想说的是,一般来说,除非您是CPU限制的
多处理
并不适合,这主要是因为增加了复杂性,而且您可能最好使用不同的高级异步框架。另外,你应该使用Python3,它好得多

这就是说,使用
多处理
可以很容易地实现这一点。我已经在Python3中完成了这项工作,但我认为在Python2中任何东西都不应该“只工作”,但我没有检查

from ctypes import c_bool
from multiprocessing import Manager, Process, Array, Value
from pprint import pprint
from time import sleep, time


class Agent(Process):

    def __init__(self, name, shared_dictionary, delay=0.5):
        """My take on your Agent.

        Key difference is that I've commonized the run-loop and used
        a shared value to signal when to stop, to demonstrate it.
        """
        super(Agent, self).__init__()
        self.name = name

        # This is going to be how we communicate between processes.
        self.shared_dictionary = shared_dictionary

        # Create a silo for us to use.
        shared_dictionary[name] = []
        self.should_stop = Value(c_bool, False)

        # Primarily for testing purposes, and for simulating 
        # slower agents.
        self.delay = delay

    def get_next_results(self):
        # In the real world I'd use abc.ABCMeta as the metaclass to do 
        # this properly.
        raise RuntimeError('Subclasses must implement this')

    def run(self):
        ii = 0
        while not self.should_stop.value:
            ii += 1
            # debugging / monitoring
            print('%s %s run loop execution %d' % (
                type(self).__name__, self.name, ii))

            next_results = self.get_next_results()

            # Add the results, along with a timestamp.
            self.shared_dictionary[self.name] += [(time(), next_results)]
            sleep(self.delay)

    def stop(self):
        self.should_stop.value = True
        print('%s %s stopped' % (type(self).__name__, self.name))


class HVACAgent(Agent):
    def get_next_results(self):
        # This is where you do your work, but for the sake of
        # the example just return a constant dictionary.
        return {'temperature': 5, 'pressure': 7, 'humidity': 9}


class DumbReadingAgent(Agent):
    """A dumb agent to demonstrate workers reading other worker values."""

    def get_next_results(self):
        # get hvac 1 results:
        hvac1_results = self.shared_dictionary.get('hvac 1')
        if hvac1_results is None:
            return None

        return hvac1_results[-1][1]['temperature']

# Script starts.
results = {}

# The "with" ensures we terminate the manager at the end.
with Manager() as manager:

    # the manager is a subprocess in its own right. We can ask
    # it to manage a dictionary (or other python types) for us
    # to be shared among the other children.
    shared_info = manager.dict()

    hvac_agent1 = HVACAgent('hvac 1', shared_info)
    hvac_agent2 = HVACAgent('hvac 2', shared_info, delay=0.1)
    dumb_agent = DumbReadingAgent('dumb hvac1 reader', shared_info)

    agents = (hvac_agent1, hvac_agent2, dumb_agent)

    list(map(lambda a: a.start(), agents))

    sleep(1)

    list(map(lambda a: a.stop(), agents))
    list(map(lambda a: a.join(), agents))

    # Not quite sure what happens to the shared dictionary after
    # the manager dies, so for safety make a local copy.
    results = dict(shared_info)

pprint(results)

这是一个非常广泛的问题。据我所知,你基本上是在问“多进程之间共享数据有哪些方法?”(你读过吗?),然后是“我应该在我的应用程序中使用哪种方法?”。你能不能把范围缩小一点,即给出一个特定的期望行为?否则,我认为人们可能会留下一些通用的答案,而这些答案实际上可能对你没有多大帮助。实际上,这正是编辑应该涵盖的内容。根据我的具体实现,我需要一种共享数据的方法…我不知道队列、JoinableQueue或其他什么是最好的,但如果我使用mpc.Process.\uuuu init\uuuuuuuself)*多个消费者都读取相同的变量,我似乎无法想出如何共享数据这是一个非常广泛的问题。据我所知,你基本上是在问“多进程之间共享数据有哪些方法?”(你读过吗?),然后是“我应该在我的应用程序中使用哪种方法?”。你能不能把范围缩小一点,即给出一个特定的期望行为?否则,我认为人们可能会留下一些通用的答案,而这些答案实际上可能对你没有多大帮助。实际上,这正是编辑应该涵盖的内容。根据我的具体实现,我需要一种共享数据的方法…我不知道队列、JoinableQueue或其他什么是最好的,但如果我使用mpc.Process.\uuuu init\uuuuuuself)*多个消费者都在读同一个变量,我似乎不知道如何共享数据。好帖!在2.7中运行良好(我以前使用过2,因为在2和3中都有代码)。我可能会分别中断运行循环,因为这些代理中的每一个都应该完全自主地运行,因此它们将有非常不同的运行循环。但我确实喜欢共享字典的想法……它让我想起了一个干净的、能够修改它的单例实例。但问题是……如果
hvac\u agent1
决定在
hvac\u agent2
的同时修改
共享信息,是否存在竞争条件问题?文档并不完全清楚,尽管它们暗示这很好,但我也用3个代理测试了它,他们不断更新字典中的条目,在5秒钟内没有停顿(因此每个代理每3-4毫秒添加一个条目),并且没有数据丢失,因此我认为,如果他们只对字典中的条目进行写入,就没有竞争条件。显然,如果代理1在代理2写入的同时写入共享信息['key']
,那么只有一个是
Initializing subsystems
Timestep 0
Measured [68] [56.948675]
heating 1
heating 2
Timestep 1
heating 3
heating 4
Timestep 2
heating 5
heating 6
from ctypes import c_bool
from multiprocessing import Manager, Process, Array, Value
from pprint import pprint
from time import sleep, time


class Agent(Process):

    def __init__(self, name, shared_dictionary, delay=0.5):
        """My take on your Agent.

        Key difference is that I've commonized the run-loop and used
        a shared value to signal when to stop, to demonstrate it.
        """
        super(Agent, self).__init__()
        self.name = name

        # This is going to be how we communicate between processes.
        self.shared_dictionary = shared_dictionary

        # Create a silo for us to use.
        shared_dictionary[name] = []
        self.should_stop = Value(c_bool, False)

        # Primarily for testing purposes, and for simulating 
        # slower agents.
        self.delay = delay

    def get_next_results(self):
        # In the real world I'd use abc.ABCMeta as the metaclass to do 
        # this properly.
        raise RuntimeError('Subclasses must implement this')

    def run(self):
        ii = 0
        while not self.should_stop.value:
            ii += 1
            # debugging / monitoring
            print('%s %s run loop execution %d' % (
                type(self).__name__, self.name, ii))

            next_results = self.get_next_results()

            # Add the results, along with a timestamp.
            self.shared_dictionary[self.name] += [(time(), next_results)]
            sleep(self.delay)

    def stop(self):
        self.should_stop.value = True
        print('%s %s stopped' % (type(self).__name__, self.name))


class HVACAgent(Agent):
    def get_next_results(self):
        # This is where you do your work, but for the sake of
        # the example just return a constant dictionary.
        return {'temperature': 5, 'pressure': 7, 'humidity': 9}


class DumbReadingAgent(Agent):
    """A dumb agent to demonstrate workers reading other worker values."""

    def get_next_results(self):
        # get hvac 1 results:
        hvac1_results = self.shared_dictionary.get('hvac 1')
        if hvac1_results is None:
            return None

        return hvac1_results[-1][1]['temperature']

# Script starts.
results = {}

# The "with" ensures we terminate the manager at the end.
with Manager() as manager:

    # the manager is a subprocess in its own right. We can ask
    # it to manage a dictionary (or other python types) for us
    # to be shared among the other children.
    shared_info = manager.dict()

    hvac_agent1 = HVACAgent('hvac 1', shared_info)
    hvac_agent2 = HVACAgent('hvac 2', shared_info, delay=0.1)
    dumb_agent = DumbReadingAgent('dumb hvac1 reader', shared_info)

    agents = (hvac_agent1, hvac_agent2, dumb_agent)

    list(map(lambda a: a.start(), agents))

    sleep(1)

    list(map(lambda a: a.stop(), agents))
    list(map(lambda a: a.join(), agents))

    # Not quite sure what happens to the shared dictionary after
    # the manager dies, so for safety make a local copy.
    results = dict(shared_info)

pprint(results)