Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/285.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scala中提供了访问依赖项,但没有PySpark_Python_Apache Spark_Rdd - Fatal编程技术网

Python Scala中提供了访问依赖项,但没有PySpark

Python Scala中提供了访问依赖项,但没有PySpark,python,apache-spark,rdd,Python,Apache Spark,Rdd,我正在尝试访问RDD的依赖项。在Scala中,这是一个非常简单的代码: scala> val myRdd = sc.parallelize(0 to 9).groupBy(_ % 2) myRdd: org.apache.spark.rdd.RDD[(Int, Iterable[Int])] = ShuffledRDD[2] at groupBy at <console>:24 scala> myRdd.dependencies res0: Seq[org.apache

我正在尝试访问RDD的依赖项。在Scala中,这是一个非常简单的代码:

scala> val myRdd = sc.parallelize(0 to 9).groupBy(_ % 2)
myRdd: org.apache.spark.rdd.RDD[(Int, Iterable[Int])] = ShuffledRDD[2] at groupBy at <console>:24

scala> myRdd.dependencies
res0: Seq[org.apache.spark.Dependency[_]] = List(org.apache.spark.ShuffleDependency@6c427386)
scala>val myRdd=sc.parallelize(0到9).groupBy(%2)
myRdd:org.apache.spark.rdd.rdd[(Int,Iterable[Int])]=shuffleddd[2]位于groupBy at:24
scala>myRdd.dependencies
res0:Seq[org.apache.spark.Dependency[\u]]=List(org.apache.spark。ShuffleDependency@6c427386)
但是PySpark中没有依赖项。有关于如何访问它们的指示吗

>>> myRdd.dependencies
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'PipelinedRDD' object has no attribute 'dependencies'
>myRdd.dependencies
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
AttributeError:“PipelinedRDD”对象没有属性“dependencies”

没有受支持的方法,因为它没有那么大的意义。你可以

rdd = sc.parallelize([1, 2, 3]).map(lambda x: x)
deps = sc._jvm.org.apache.spark.api.java.JavaRDD.toRDD(rdd._jrdd).dependencies()
print(deps)
## List(org.apache.spark.OneToOneDependency@63b86b0d)

for i in range(deps.size()):
    print(deps.apply(i))

## org.apache.spark.OneToOneDependency@63b86b0d

但我认为这不会让你走得更远。

这太完美了,这正是我所需要的。非常感谢你!