Python 如何正确运行ray?
试图了解如何正确使用Python 如何正确运行ray?,python,python-3.x,parallel-processing,multiprocessing,ray,Python,Python 3.x,Parallel Processing,Multiprocessing,Ray,试图了解如何正确使用ray编程 下面的结果似乎与所解释的ray的性能改进不一致 环境: Python版本:3.6.10 射线版本:0.7.4 以下是机器规格: >>> import psutil >>> psutil.cpu_count(logical=False) 4 >>> psutil.cpu_count(logical=True) 8 >>> mem = psutil.virtual_memory() >&
ray
编程
下面的结果似乎与所解释的ray
的性能改进不一致
环境:
- Python版本:3.6.10
- 射线版本:0.7.4
>>> import psutil
>>> psutil.cpu_count(logical=False)
4
>>> psutil.cpu_count(logical=True)
8
>>> mem = psutil.virtual_memory()
>>> mem.total
33707012096 # 32 GB
首先,使用队列的传统python多处理(multiproc_function.py):
结果是:
$ time python multiproc_function.py
n = 0
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
(0, 4999999950000000, 11.12)
(1, 4999999950000000, 11.14)
(2, 4999999950000000, 11.1)
(3, 4999999950000000, 11.23)
(4, 4999999950000000, 11.2)
(6, 4999999950000000, 11.22)
(7, 4999999950000000, 11.24)
(5, 4999999950000000, 11.54)
real 0m19.156s
user 1m13.614s
sys 0m24.496s
$ time python ray_test.py
Setting num_cpus to # physical cores = 4
2020-04-28 16:52:51,419 INFO resource_spec.py:205 -- Starting Ray with 18.16 GiB memory available for workers and up to 9.11 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
(pid=78483) n = 2
(pid=78485) n = 1
(pid=78484) n = 3
(pid=78486) n = 0
(pid=78484) n = 4
(pid=78483) n = 5
(pid=78485) n = 6
(pid=78486) n = 7
(0, 4999999950000000, 5.12)
(1, 4999999950000000, 5.02)
(2, 4999999950000000, 4.8)
(3, 4999999950000000, 4.43)
(4, 4999999950000000, 4.64)
(5, 4999999950000000, 4.61)
(6, 4999999950000000, 4.84)
(7, 4999999950000000, 4.99)
real 0m45.082s
user 0m22.163s
sys 0m10.213s
在运行期间检查htop
时,内存从2.6 GB的基本消耗量增加到8 GB,并且所有8个处理器都已完全消耗。此外,从user+sys
real
可以清楚地看出,并行处理正在发生
以下是光线测试代码(ray_test.py):
结果如下:
$ time python multiproc_function.py
n = 0
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
(0, 4999999950000000, 11.12)
(1, 4999999950000000, 11.14)
(2, 4999999950000000, 11.1)
(3, 4999999950000000, 11.23)
(4, 4999999950000000, 11.2)
(6, 4999999950000000, 11.22)
(7, 4999999950000000, 11.24)
(5, 4999999950000000, 11.54)
real 0m19.156s
user 1m13.614s
sys 0m24.496s
$ time python ray_test.py
Setting num_cpus to # physical cores = 4
2020-04-28 16:52:51,419 INFO resource_spec.py:205 -- Starting Ray with 18.16 GiB memory available for workers and up to 9.11 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
(pid=78483) n = 2
(pid=78485) n = 1
(pid=78484) n = 3
(pid=78486) n = 0
(pid=78484) n = 4
(pid=78483) n = 5
(pid=78485) n = 6
(pid=78486) n = 7
(0, 4999999950000000, 5.12)
(1, 4999999950000000, 5.02)
(2, 4999999950000000, 4.8)
(3, 4999999950000000, 4.43)
(4, 4999999950000000, 4.64)
(5, 4999999950000000, 4.61)
(6, 4999999950000000, 4.84)
(7, 4999999950000000, 4.99)
real 0m45.082s
user 0m22.163s
sys 0m10.213s
我在这里做错了什么吗?首先,Ray不能保证CPU亲和力或资源隔离。这可能是它的CPU使用率不饱和的原因。(但我不是100%肯定)。您可以尝试使用psutil设置cpu亲和力,并查看内核是否仍然未饱和。().
关于结果,你介意试试最新版本的Ray吗?从版本0.7.4开始,Ray在性能和内存管理方面取得了相当好的进展。Thx。我升级到ray v。0.8.4. 它运行速度快了大约5%。我使用p=psutil.Process()将cpu关联设置为使用所有内核;p、 cpu_亲和性([])
结果是相同的。内存使用情况如何?同样,您也应该这样做。如果将列表更改为数组nums=np.arange(N_list_ITEMS)
,则运行时将被减半。在这种情况下,每个光线工作者将反序列化整数列表,这将占用大量内存和时间。Numpy数组避免了这个问题,因为Ray将它们存储在共享内存中。我认为在您的示例中,多处理使用fork
在引擎盖下创建程序的副本,因此工作人员自动拥有列表的副本,从而避免了反序列化步骤。在这样的特殊情况下,如果您有一个大的(相对简单的)对象,您希望所有工作人员都有一个该对象的副本,则可以使用它。@RobertNishihara Thx供您回复。我尝试了nums=np.arange(N\u列表项)
而不是nums=LIST(范围(N\u列表项))
。它运行得更慢,需要真正的0m54.076s用户0m7.388s系统0m2.585s
。
$ time python ray_test.py
Setting num_cpus to # logical cores = 8
2020-04-28 16:27:43,709 INFO resource_spec.py:205 -- Starting Ray with 17.29 GiB memory available for workers and up to 8.65 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
Killed
real 0m25.205s
user 0m15.056s
sys 0m4.028s