Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark sql查询提供数据类型未匹配错误_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark Spark sql查询提供数据类型未匹配错误

Apache spark Spark sql查询提供数据类型未匹配错误,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有一个小的sql查询,它在sql中工作得非常好,但在配置单元中工作的查询与预期的一样。 表包含用户信息,下面是查询 spark.sql("select * from users where (id,id_proof) not in ((1232,345))").show; 我在spark中得到了以下例外 org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('age', deleted_inventory.

我有一个小的sql查询,它在sql中工作得非常好,但在配置单元中工作的查询与预期的一样。 表包含用户信息,下面是查询

spark.sql("select * from users where (id,id_proof) not in ((1232,345))").show;
我在spark中得到了以下例外

org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('age', deleted_inventory.`age`, 'id_proof', deleted_inventory.`id_proof`) IN (named_struct('col1',1232, 'col2', 345)))' due to data type mismatch: Arguments must be same type but were: StructType(StructField(id,IntegerType,true), StructField(id_proof,IntegerType,true)) != StructType(StructField(col1,IntegerType,false), StructField(col2,IntegerType,false));

id和id证明是整数类型。

尝试使用with表,它可以工作

scala> val df = Seq((101,121), (1232,345),(222,2242)).toDF("id","id_proof")
df: org.apache.spark.sql.DataFrame = [id: int, id_proof: int]

scala> df.show(false)
+----+--------+
|id  |id_proof|
+----+--------+
|101 |121     |
|1232|345     |
|222 |2242    |
+----+--------+


scala> df.createOrReplaceTempView("girish")

scala> spark.sql("with t1( select 1232 id,345 id_proof ) select id, id_proof from girish where (id,id_proof) not in (select id,id_proof from t1) ").show(false)
+---+--------+
|id |id_proof|
+---+--------+
|101|121     |
|222|2242    |
+---+--------+


scala>

请执行spark.sqldesc usersdesc users给出错误。我是如何使用printSchema的,所有列都显示为nullable=true。请在not in子句中将schemaI dded名称共享给列,这样就可以了。像select*from users where id,id\u proof不在1作为id,2作为id\u proof中。Spark希望它是一个名称结构: