Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何从每行的列中提取特定元素?_Scala_Apache Spark_Spark Dataframe - Fatal编程技术网

Scala 如何从每行的列中提取特定元素?

Scala 如何从每行的列中提取特定元素?,scala,apache-spark,spark-dataframe,Scala,Apache Spark,Spark Dataframe,Spark 2.2.0和Scala 2.11.8中有以下数据帧 +----------+-------------------------------+ |item | other_items | +----------+-------------------------------+ | 111 |[[444,1.0],[333,0.5],[666,0.4]]| | 222 |[[444,1.0],[333,0.5]]

Spark 2.2.0和Scala 2.11.8中有以下数据帧

+----------+-------------------------------+
|item      |        other_items            |
+----------+-------------------------------+
|  111     |[[444,1.0],[333,0.5],[666,0.4]]|
|  222     |[[444,1.0],[333,0.5]]          |
|  333     |[]                             |
|  444     |[[111,2.0],[555,0.5],[777,0.2]]|
我想获得以下数据帧:

+----------+-------------+
|item      | other_items |
+----------+-------------+
|  111     | 444         |
|  222     | 444         |
|  444     | 111         |
因此,基本上,我需要从每行的
其他\u项中提取第一个
项。另外,我需要忽略那些在
其他产品中有空数组的行
[]

我怎么做

我尝试过这种方法,但它没有给我一个预期的结果

result = df.withColumn("other_items",$"other_items"(0))
printScheme
提供以下输出:

 |-- item: string (nullable = true)
 |-- other_items: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _1: string (nullable = true)
 |    |    |-- _2: double (nullable = true)
像这样:

val df = Seq(
  ("111", Seq(("111", 1.0), ("333", 0.5), ("666", 0.4))), ("333", Seq())
).toDF("item", "other_items")


df.select($"item", $"other_items"(0)("_1").alias("other_items"))
  .na.drop(Seq("other_items")).show
当第一个
apply
$“other_items”(0)
)选择数组的第一个元素时,第二个
apply
'u 1”)
选择
\u 1
字段,并且
na.drop
删除空数组引入的
空值

+----+-----------+
|item|other_items|
+----+-----------+
| 111|        111|
+----+-----------+