Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何删除spark和scala中的第一组零值列(或行)_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

如何删除spark和scala中的第一组零值列(或行)

如何删除spark和scala中的第一组零值列(或行),scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,您好,我是spark的新手,我有两个数据帧: +--------------+-------+-------+-------+-------+-------+-------+-------+ | Region| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20| +--------------+-------+-------+-------+-------+-------+-------+-------+ |

您好,我是spark的新手,我有两个数据帧:

+--------------+-------+-------+-------+-------+-------+-------+-------+
|        Region| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
|         Paris|      0|      0|      0|      1|      7|      0|      5|
+--------------+-------+-------+-------+-------+-------+-------+-------+
+----------+-------+
|    Period|Reports|
+----------+-------+
|2020/07/20|      0|
|2020/07/21|      0|
|2020/07/22|      0|
|2020/07/23|      8|
|2020/07/24|      0|
|2020/07/25|      1|
+----------+-------+
如何删除第一个0值连续列3/7/20、3/8/20、3/9/20,而不删除列3/12/20?
类似地,对于第二个数据帧,如何删除行3/12/20、0和2020/07/21、0和2020/07/22、0,而不删除带有2020/07/22、0的行,我看不出这两个数据帧之间的含义或链接。第二行是否已实际使用?@JoJolyne,
删除第3/12/20行、第0行和第2020/07/21行、第0行和第2020/07/22行,0而不删除带有2020/07/22的行,0
应重命名为
删除带有2020/17/20行、第0行和第2020/07/21行、第0行和第2020/07/22行,不删除带有2020/07/24行的第0行将有意义。我已经提供了解决方案,请检查答案。
import org.apache.spark.sql.expressions.Window 
import org.apache.spark.sql.functions._ 
val df=Seq(("0","0","0","1","7","0","5")).toDF("3/7/20","3/8/20","3/9/20","3/10/20","3/11/20","3/12/20","3/13/20") 

var columnsAndValues = df.columns.flatMap { c => Array(lit(c), col(c)) } 
df.printSchema() 
val df1 = df.withColumn("myMap", map(columnsAndValues:_*)).select(explode($"myMap"))
.toDF("Region","Paris")

val windowSpec = Window.partitionBy(lit("A")).orderBy(lit("A")) 

df1.withColumn("row_number",row_number.over(windowSpec))
.withColumn("lag", lag("Paris", 1, 0).over(windowSpec))
.withColumn("lead", lead("Paris", 1, 0)
.over(windowSpec)).where(($"lag">0) or ($"Paris"> 0)).show()

/*
+-------+-----+----------+---+----+                                             
| Region|Paris|row_number|lag|lead|
+-------+-----+----------+---+----+
|3/10/20|    1|         4|  0|   7|
|3/11/20|    7|         5|  1|   0|
|3/12/20|    0|         6|  7|   5|
|3/13/20|    5|         7|  0|   0|
+-------+-----+----------+---+----+
*/

val df2=Seq(("2020/07/20","0"),("2020/07/21","0"),("2020/07/22","0"),("2020/07/23","8"),("2020/07/24","0"),("2020/07/25","1")).toDF("Period","Reports")

df2.withColumn("row_number",row_number.over(windowSpec))
.withColumn("lag", lag("Reports", 1, 0).over(windowSpec))
.withColumn("lead", lead("Reports", 1, 0).over(windowSpec))
.where((($"lag">0) or ($"Reports"> 0)) and ($"row_number">1)).show()

/*
+----------+-------+----------+---+----+                                        
|    Period|Reports|row_number|lag|lead|
+----------+-------+----------+---+----+
|2020/07/23|      8|         4|  0|   0|
|2020/07/24|      0|         5|  8|   1|
|2020/07/25|      1|         6|  0|   0|
+----------+-------+----------+---+----+
*/