Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python randomSplit是否返回原始rdd的副本或引用?_Python_Apache Spark - Fatal编程技术网

Python randomSplit是否返回原始rdd的副本或引用?

Python randomSplit是否返回原始rdd的副本或引用?,python,apache-spark,Python,Apache Spark,假设我有下面的代码 for idx in xrange(0, 10): train_test_split = training.randomSplit(weights=[0.75, 0.25]) train_cv = train_test_split[0] test_cv = train_test_split[1] # scale train_cv and test_cv 通过缩放train\u cv和test\u cv,原始数据会受到影响吗?RDD是不可变的。

假设我有下面的代码

for idx in xrange(0, 10):
    train_test_split = training.randomSplit(weights=[0.75, 0.25])
    train_cv = train_test_split[0]
    test_cv = train_test_split[1]
    # scale train_cv and test_cv
通过缩放
train\u cv
test\u cv
,原始数据会受到影响吗?

RDD是不可变的。 因此,实际上不可能只对RDD进行“更改”转换。
因此,不,原始数据不会受到影响。

如果我们多次调用train_cv上的操作(请注意,未缓存培训),train_cv是否每次都会不同?@lavmudgal RDD会根据其依赖性DAG进行评估。如果实际数据源发生更改,则可能会看到不同的结果。这是一个“断章取义”的答案。我建议你根据自己的具体情况提出一个新问题。