Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 对WrappedArray元素的访问_Python_Scala_Apache Spark_Pyspark - Fatal编程技术网

Python 对WrappedArray元素的访问

Python 对WrappedArray元素的访问,python,scala,apache-spark,pyspark,Python,Scala,Apache Spark,Pyspark,我有一个spark数据框,下面是模式: |-- eid: long (nullable = true) |-- age: long (nullable = true) |-- sex: long (nullable = true) |-- father: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: long (containsNull = true) 和

我有一个spark数据框,下面是模式:

|-- eid: long (nullable = true)
|-- age: long (nullable = true)
|-- sex: long (nullable = true)
|-- father: array (nullable = true)
|    |-- element: array (containsNull = true)
|    |    |-- element: long (containsNull = true)
和一个行示例:

df.select(df['father']).show()
+--------------------+
|              father|
+--------------------+
|[WrappedArray(-17...|
|[WrappedArray(-11...|
|[WrappedArray(13,...|
+--------------------+
类型是

DataFrame[father: array<array<bigint>>]
数据帧[父:数组] 如何访问内部数组的每个元素?例如,第一行中的-17?
我尝试了不同的方法,比如
df.select(df['father'])(0)(0).show()
,但没有成功。

scala中的解决方案应该是

import org.apache.spark.sql.functions._
val data =  sparkContext.parallelize("""{"eid":1,"age":30,"sex":1,"father":[[1,2]]}""" :: Nil)
val dataframe = sqlContext.read.json(data).toDF()
数据帧看起来像

+---+---+---+--------------------+
|eid|age|sex|father              |
+---+---+---+--------------------+
|1  |30 |1  |[WrappedArray(1, 2)]|
+---+---+---+--------------------+
解决办法应该是

dataframe.select(col("father")(0)(0) as("first"), col("father")(0)(1) as("second")).show(false)
+-----+------+
|first|second|
+-----+------+
|1    |2     |
+-----+------+
输出应该是

dataframe.select(col("father")(0)(0) as("first"), col("father")(0)(1) as("second")).show(false)
+-----+------+
|first|second|
+-----+------+
|1    |2     |
+-----+------+

如果我没有弄错的话,Python中的语法是

df.select(df['father'])[0][0].show()


请参见此处的一些示例:

另一个scala答案如下所示:

df.select(col("father").getItem(0) as "father_0", col("father").getItem(1) as "father_1")

为什么要用
array
函数来包装列<代码>数据帧。选择($“父”(0)(0))或
数据帧。选择(列(“父”)(0)(0))
也可以正常工作