Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
当:Spark/scala dataframe时,方法的参数不足_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

当:Spark/scala dataframe时,方法的参数不足

当:Spark/scala dataframe时,方法的参数不足,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我在spark Df1和Df2中有2个数据帧我基于一个公共列(即Id)连接这2个数据帧,然后添加一个额外的列结果并检查多个列,如果有任何列数据匹配,那么我需要在新列中插入匹配,如果没有匹配的条件,那么需要在该列中作为“不匹配”传递。我正在写下面的代码 df1.join(df1,df2("id") === df2("id")) .withColumn("Result", when( df1("adhar_no") === df2("adhar_no")" || d

我在spark Df1和Df2中有2个数据帧我基于一个公共列(即Id)连接这2个数据帧,然后添加一个额外的列结果并检查多个列,如果有任何列数据匹配,那么我需要在新列中插入匹配,如果没有匹配的条件,那么需要在该列中作为“不匹配”传递。我正在写下面的代码

    df1.join(df1,df2("id") === df2("id"))
   .withColumn("Result",
   when(
   df1("adhar_no") === df2("adhar_no")" || 
   df1("pan_no") === df2("pan_no") || 
   df1("Voter_id") === df2("Voter_id") || 
   df1("DL_no") === df2("DL_no"),"Matched"
  ).otherwise("Not Matched"))

  But getting error

  <console>:60: error: not enough arguments for method when: (condition: org.apache.spark.sql.Column, value: Any)org.apache.spark.sql.Column. Unspecified value parameter value.

  I have also tried below code

    df1.join(df2,df1("id") === df2("id"))
   .withColumn("Result",when(df1("adhar_no") === df2("adhar_no") || 
   when(df1("pan_no") === df2("pan_no") || 
   when(df1("Voter_id") === df2("Voter_id") ||  
   when(df1("DL_no") === df2("DL_no"),"Matched"))))
  .otherwise("Not Matched"))
df1.join(df1,df2(“id”)==df2(“id”))
.withColumn(“结果”,
什么时候(
df1(“adhar_no”)==df2(“adhar_no”)”|
df1(“盘号”)==df2(“盘号”)|
df1(“投票者id”)==df2(“投票者id”)||
df1(“DL_编号”)==df2(“DL_编号”),“匹配”
)。否则(“不匹配”))
但是得到了错误
:60:错误:当:(条件:org.apache.spark.sql.Column,值:Any)org.apache.spark.sql.Column.未指定的值参数值时,方法的参数不足。
我也试过下面的代码
df1.join(df2,df1(“id”)==df2(“id”))
当(df1(“adhar_no”)==df2(“adhar_no”)|时,使用列(“结果”)
当(df1(“pan_no”)==df2(“pan_no”)|
当(df1(“投票者id”)==df2(“投票者id”)||
当(df1(“DL_编号”)==df2(“DL_编号”),“匹配”))
。否则(“不匹配”))

在这两种情况下,我都遇到了错误。有人能帮我怎么做吗。

第一种情况是因为第4行有一个额外的
(第一种情况下)

这将很好地工作:

 df1.join(df2,df2("id") === df2("id"))
   .withColumn("Result",
   when(
   df1("adhar_no") === df2("adhar_no") || 
   df1("pan_no") === df2("pan_no") || 
   df1("Voter_id") === df2("Voter_id") || 
   df1("DL_no") === df2("DL_no"),"Matched"
  ).otherwise("Not Matched"))
第二个是因为每个时候都必须有一个输出值:这个例子对我来说毫无意义。第一个很好,但您需要删除yout extra”(我假设是一种类型)

此外,作为个人偏好或建议,我更愿意使用美元语法引用该专栏。这对我来说更清晰,并帮助我避免此类拼写错误

用示例编辑

一些糟糕的测试数据帧

   val df1 = List((1, 10, 100, 1000, 10000), (2, 20, 200, 2000, 20000), (3, 30, 300, 3000, 30000)).toDF("id","adhar_no", "pan_no", "Voter_id", "DL_no")
    val df2 = List((1, 10, 100, 1000, 10000), (2, 20, 200, 2000, 20000), (4, 40, 400, 4000, 40000)).toDF("id","adhar_no", "pan_no", "Voter_id", "DL_no")
然后,修复代码中的歧义:

 df1.as("df1").join(df2.as("df2"), df1("id") === df2("id"))
      .withColumn("Result",  when(
          $"df1.adhar_no" === $"df2.adhar_no" ||
            $"df1.pan_no" === $"df2.pan_no" ||
            $"df1.Voter_id" === $"df2.Voter_id" ||
            $"df1.DL_no" === $"df2.DL_no"
          , "Matched"
        ).otherwise("Not Matched")
      )
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+
| id|adhar_no|pan_no|Voter_id|DL_no| id|adhar_no|pan_no|Voter_id|DL_no| Result|
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+
|  1|      10|   100|    1000|10000|  1|      10|   100|    1000|10000|Matched|
|  2|      20|   200|    2000|20000|  2|      20|   200|    2000|20000|Matched|
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+

第一种情况是因为第4行有一个额外的
(第一种情况下)

这将很好地工作:

 df1.join(df2,df2("id") === df2("id"))
   .withColumn("Result",
   when(
   df1("adhar_no") === df2("adhar_no") || 
   df1("pan_no") === df2("pan_no") || 
   df1("Voter_id") === df2("Voter_id") || 
   df1("DL_no") === df2("DL_no"),"Matched"
  ).otherwise("Not Matched"))
第二个是因为每个时候都必须有一个输出值:这个例子对我来说毫无意义。第一个很好,但您需要删除yout extra”(我假设是一种类型)

此外,作为个人偏好或建议,我更愿意使用美元语法引用该专栏。这对我来说更清晰,并帮助我避免此类拼写错误

用示例编辑

一些糟糕的测试数据帧

   val df1 = List((1, 10, 100, 1000, 10000), (2, 20, 200, 2000, 20000), (3, 30, 300, 3000, 30000)).toDF("id","adhar_no", "pan_no", "Voter_id", "DL_no")
    val df2 = List((1, 10, 100, 1000, 10000), (2, 20, 200, 2000, 20000), (4, 40, 400, 4000, 40000)).toDF("id","adhar_no", "pan_no", "Voter_id", "DL_no")
然后,修复代码中的歧义:

 df1.as("df1").join(df2.as("df2"), df1("id") === df2("id"))
      .withColumn("Result",  when(
          $"df1.adhar_no" === $"df2.adhar_no" ||
            $"df1.pan_no" === $"df2.pan_no" ||
            $"df1.Voter_id" === $"df2.Voter_id" ||
            $"df1.DL_no" === $"df2.DL_no"
          , "Matched"
        ).otherwise("Not Matched")
      )
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+
| id|adhar_no|pan_no|Voter_id|DL_no| id|adhar_no|pan_no|Voter_id|DL_no| Result|
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+
|  1|      10|   100|    1000|10000|  1|      10|   100|    1000|10000|Matched|
|  2|      20|   200|    2000|20000|  2|      20|   200|    2000|20000|Matched|
+---+--------+------+--------+-----+---+--------+------+--------+-----+-------+

您正在加入相同的df1.join(df1)如何访问df2?您正在加入相同的df1.join(df1)你是如何访问df2的?是的,这是输入错误我已经更新了代码,但我得到了相同的错误。在修复了一些其他错误后,它对我来说运行良好,我将用一个例子@SCouto更新我的答案是有效的。。非常感谢SCouto。我还有一个疑问,我正在尝试实现更多的情况,比如当两个列名中的列名都为null时恩,它应该给出不匹配的,我有这样的书写..withColumn(“result”,当(df1(“name”)==df2(“id”)| |(df1(“name”)为null,df2(“name”)为null,“notmatched”)。否则(“matched”))。但它显示未找到&。是的,它是打字错误。我已更新了代码,但我得到了相同的错误。在修复了其他一些错误后,它对我来说运行良好,我将使用一个示例@SCouto更新我的答案是有效的。。非常感谢SCouto。我还有一个疑问,我正在尝试实现更多的情况,例如,当两个列名中的列名都为null时然后它应该给出不匹配的,我写的是这样的..withColumn(“result”),当(df1(“name”)==df2(“id”)| |(df1(“name”)为null,df2(“name”)为null,“notmatched”)。否则(“matched”)。但它的显示未找到&。