Python 如何正确运行ray？_Python_Python 3.x_Parallel Processing_Multiprocessing_Ray

Python 如何正确运行ray？

python python-3.x parallel-processing

Python 如何正确运行ray？,python,python-3.x,parallel-processing,multiprocessing,ray,Python,Python 3.x,Parallel Processing,Multiprocessing,Ray,试图了解如何正确使用ray编程下面的结果似乎与所解释的ray的性能改进不一致环境： Python版本：3.6.10 射线版本：0.7.4 以下是机器规格： >>> import psutil >>> psutil.cpu_count(logical=False) 4 >>> psutil.cpu_count(logical=True) 8 >>> mem = psutil.virtual_memory() >&

试图了解如何正确使用

ray

编程

下面的结果似乎与所解释的

ray

的性能改进不一致

环境：

Python版本：3.6.10
射线版本：0.7.4

以下是机器规格：

>>> import psutil
>>> psutil.cpu_count(logical=False)
4
>>> psutil.cpu_count(logical=True)
8
>>> mem = psutil.virtual_memory()
>>> mem.total
33707012096 # 32 GB

首先，使用

队列的传统python多处理（multiproc_function.py）：
结果是：
$ time python multiproc_function.py
n = 0
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
(0, 4999999950000000, 11.12)
(1, 4999999950000000, 11.14)
(2, 4999999950000000, 11.1)
(3, 4999999950000000, 11.23)
(4, 4999999950000000, 11.2)
(6, 4999999950000000, 11.22)
(7, 4999999950000000, 11.24)
(5, 4999999950000000, 11.54)

real    0m19.156s
user    1m13.614s
sys     0m24.496s

$ time python ray_test.py
Setting num_cpus to # physical cores = 4
2020-04-28 16:52:51,419 INFO resource_spec.py:205 -- Starting Ray with 18.16 GiB memory available for workers and up to 9.11 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
(pid=78483) n = 2
(pid=78485) n = 1
(pid=78484) n = 3
(pid=78486) n = 0
(pid=78484) n = 4
(pid=78483) n = 5
(pid=78485) n = 6
(pid=78486) n = 7
(0, 4999999950000000, 5.12)
(1, 4999999950000000, 5.02)
(2, 4999999950000000, 4.8)
(3, 4999999950000000, 4.43)
(4, 4999999950000000, 4.64)
(5, 4999999950000000, 4.61)
(6, 4999999950000000, 4.84)
(7, 4999999950000000, 4.99)

real    0m45.082s
user    0m22.163s
sys     0m10.213s

在运行期间检查htop
时，内存从2.6 GB的基本消耗量增加到8 GB，并且所有8个处理器都已完全消耗。此外，从user+sys
real
可以清楚地看出，并行处理正在发生
以下是光线测试代码（ray_test.py）：
结果如下：
$ time python multiproc_function.py
n = 0
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
(0, 4999999950000000, 11.12)
(1, 4999999950000000, 11.14)
(2, 4999999950000000, 11.1)
(3, 4999999950000000, 11.23)
(4, 4999999950000000, 11.2)
(6, 4999999950000000, 11.22)
(7, 4999999950000000, 11.24)
(5, 4999999950000000, 11.54)

real    0m19.156s
user    1m13.614s
sys     0m24.496s

$ time python ray_test.py
Setting num_cpus to # physical cores = 4
2020-04-28 16:52:51,419 INFO resource_spec.py:205 -- Starting Ray with 18.16 GiB memory available for workers and up to 9.11 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
(pid=78483) n = 2
(pid=78485) n = 1
(pid=78484) n = 3
(pid=78486) n = 0
(pid=78484) n = 4
(pid=78483) n = 5
(pid=78485) n = 6
(pid=78486) n = 7
(0, 4999999950000000, 5.12)
(1, 4999999950000000, 5.02)
(2, 4999999950000000, 4.8)
(3, 4999999950000000, 4.43)
(4, 4999999950000000, 4.64)
(5, 4999999950000000, 4.61)
(6, 4999999950000000, 4.84)
(7, 4999999950000000, 4.99)

real    0m45.082s
user    0m22.163s
sys     0m10.213s

我在这里做错了什么吗？
首先，Ray不能保证CPU亲和力或资源隔离。这可能是它的CPU使用率不饱和的原因。（但我不是100%肯定）。您可以尝试使用psutil设置cpu亲和力，并查看内核是否仍然未饱和。(). 
关于结果，你介意试试最新版本的Ray吗？从版本0.7.4开始，Ray在性能和内存管理方面取得了相当好的进展。
Thx。我升级到ray v。0.8.4. 它运行速度快了大约5%。我使用p=psutil.Process（）将cpu关联设置为使用所有内核；p、 cpu_亲和性（[]）
结果是相同的。内存使用情况如何？同样，您也应该这样做。如果将列表更改为数组nums=np.arange（N_list_ITEMS）
，则运行时将被减半。在这种情况下，每个光线工作者将反序列化整数列表，这将占用大量内存和时间。Numpy数组避免了这个问题，因为Ray将它们存储在共享内存中。我认为在您的示例中，多处理使用fork
在引擎盖下创建程序的副本，因此工作人员自动拥有列表的副本，从而避免了反序列化步骤。在这样的特殊情况下，如果您有一个大的（相对简单的）对象，您希望所有工作人员都有一个该对象的副本，则可以使用它。@RobertNishihara Thx供您回复。我尝试了nums=np.arange（N\u列表项）
而不是nums=LIST（范围（N\u列表项））
。它运行得更慢，需要真正的0m54.076s用户0m7.388s系统0m2.585s。
$ time python ray_test.py
Setting num_cpus to # logical cores  = 8
2020-04-28 16:27:43,709 INFO resource_spec.py:205 -- Starting Ray with 17.29 GiB memory available for workers and up to 8.65 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
Killed

real    0m25.205s
user    0m15.056s
sys     0m4.028s