Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/opengl/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Numpy seed有时不适用于dask函数_Python_Dask_Dask Distributed - Fatal编程技术网

Python Numpy seed有时不适用于dask函数

Python Numpy seed有时不适用于dask函数,python,dask,dask-distributed,Python,Dask,Dask Distributed,在dask分布式函数调用之间运行时,种子设定有时会失败。其目的是将种子值传递给一组MC模拟试验,这些试验大部分时间都有效;但是,并非总是如此。问题归结为以下示例: from dask.distributed import Client import numpy as np def get_rand4seed(seedx): np.random.seed(seedx) rand1 = np.random.rand(1)[0] return seedx, rand1 se

在dask分布式函数调用之间运行时,种子设定有时会失败。其目的是将种子值传递给一组MC模拟试验,这些试验大部分时间都有效;但是,并非总是如此。问题归结为以下示例:

from dask.distributed import Client
import numpy as np

def get_rand4seed(seedx):
    np.random.seed(seedx)
    rand1 = np.random.rand(1)[0]
    return seedx, rand1

seedrange = 100
seed_ids = np.arange(0,seedrange).tolist()

client = Client()
a = client.map(get_rand4seed, seed_ids)
results = client.gather(a)
client.close()

for result in results:
    # take seed packed in result and calculate correct 1st random number
    np.random.seed(result[0])
    correct_result = np.random.rand(1)[0]

    # comparing with 1st random number calculated in parallelized func
    comparison = 'seed=%s, dask=%s, correct=%s' % (result[0], result[1], correct_result)
    if result[1] != correct_result:
        print('DIFF: %s' % comparison)
    else:
        pass
        #print(comparison)
通常情况下,5%到10%的情况是不正确的,在第10项左右之后,错误的可能性似乎更大。而且,有时所有100项都是正确的。示例结果:

DIFF: seed=10, dask=0.6503742417395917, correct=0.771320643266746
DIFF: seed=18, dask=0.5054533737348429, correct=0.6503742417395917
DIFF: seed=26, dask=0.038561680881409655, correct=0.30793495262497084
DIFF: seed=34, dask=0.780100460524675, correct=0.038561680881409655
DIFF: seed=69, dask=0.6063543377764754, correct=0.29624916167243354
DIFF: seed=77, dask=0.29624916167243354, correct=0.9191090317991818
DIFF: seed=85, dask=0.6575115686178157, correct=0.620373814553256
DIFF: seed=93, dask=0.3072410093435699, correct=0.6063543377764754

Python 3.6.9、dask 2.9.0

我无法运行您的代码。。。抱怨没有使用
如果uuu name\uuuu=='\uuuu main\uuuu':
然后它给了我这个

NameError: name 'results' is not defined
distributed.nanny - WARNING - Restarting worker
因此,我看了一下并重写了您的代码,如下所示

import dask.bag as db
import numpy as np


def get_rand4seed(seedx):
    np.random.seed(seedx)
    rand1 = np.random.rand(1)[0]
    return seedx, rand1


seedrange = 100
b = db.from_sequence(np.arange(seedrange), npartitions=4)
results = b.map(get_rand4seed).compute()
for result in results:
    np.random.seed(result[0])
    correct_result = np.random.rand(1)[0]
    comparison = 'seed=%s, dask=%s, correct=%s' % (
        result[0], result[1], correct_result)
    if result[1] != correct_result:
        print('DIFF: %s' % comparison)
    else:
        pass

代码执行完美,不打印任何内容,我想这意味着一切正常。

不清楚为什么直接使用map/gather不起作用;但是,我确实发现使用dask包确实会产生正确的播种效果。