Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark SparkSession.sql和Dataset.sqlContext.sql之间有什么区别?_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark SparkSession.sql和Dataset.sqlContext.sql之间有什么区别?

Apache spark SparkSession.sql和Dataset.sqlContext.sql之间有什么区别?,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有下面的代码片段,我想知道这两者之间的区别是什么,我应该使用哪一个?我使用的是spark 2.2 Dataset<Row> df = sparkSession.readStream() .format("kafka") .load(); df.createOrReplaceTempView("table"); df.printSchema(); Dataset<Row> resultSet = df.sqlContext().sql("select

我有下面的代码片段,我想知道这两者之间的区别是什么,我应该使用哪一个?我使用的是spark 2.2

Dataset<Row> df = sparkSession.readStream()
    .format("kafka")
    .load();

df.createOrReplaceTempView("table");
df.printSchema();

Dataset<Row> resultSet =  df.sqlContext().sql("select value from table"); //sparkSession.sql(this.query);
StreamingQuery streamingQuery = resultSet
        .writeStream()
        .trigger(Trigger.ProcessingTime(1000))
        .format("console")
        .start();
vs

sparkSession.sqlsql查询和df.sqlContext.sqlsql查询之间有一个非常细微的区别

请注意,在一个Spark应用程序中可以有零个、两个或多个SparkSession,但假设在Spark SQL应用程序中至少有一个SparkSession,而且通常只有一个SparkSession

还请注意,数据集已绑定到它在其中创建的SparkSession,并且SparkSession永远不会更改

您可能想知道为什么会有人想要它,但这给了您查询之间的界限,您可以对不同的数据集使用相同的表名,这实际上是Spark SQL的一个非常强大的功能

下面的示例显示了这种差异,希望能让您了解它到底为什么强大

scala> spark.version
res0: String = 2.3.0-SNAPSHOT

scala> :type spark
org.apache.spark.sql.SparkSession

scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+

scala> val df = spark.range(5)
df: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> df.sqlContext.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+

scala> val anotherSession = spark.newSession
anotherSession: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@195c5803

scala> anotherSession.range(10).createOrReplaceTempView("new_table")

scala> anotherSession.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
|        |new_table|       true|
+--------+---------+-----------+


scala> df.sqlContext.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+
scala> spark.version
res0: String = 2.3.0-SNAPSHOT

scala> :type spark
org.apache.spark.sql.SparkSession

scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+

scala> val df = spark.range(5)
df: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> df.sqlContext.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+

scala> val anotherSession = spark.newSession
anotherSession: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@195c5803

scala> anotherSession.range(10).createOrReplaceTempView("new_table")

scala> anotherSession.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
|        |new_table|       true|
+--------+---------+-----------+


scala> df.sqlContext.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+