Apache spark Spark cassandra:join table with the condition on the query on the primary RDD中的属性(“where table a.myValue>;table b.myOtherValue";”的查询)

Apache spark Spark cassandra:join table with the condition on the query on the primary RDD中的属性(“where table a.myValue>;table b.myOtherValue";”的查询),apache-spark,cassandra,spark-cassandra-connector,Apache Spark,Cassandra,Spark Cassandra Connector,有没有一种方法可以连接两个表,在两个表之间的列上添加一个条件 例如: case class TableA(pkA: Int, valueA: Int) case class TableB(pkB: Int, valueB: Int) val rddA = sc.cassandraTable[TableA]("ks", "tableA") rddA.joinWithCassandraTable[TableB]("ks", "tableB").where("tableB.valueB >

有没有一种方法可以连接两个表,在两个表之间的列上添加一个条件

例如:

case class TableA(pkA: Int, valueA: Int)
case class TableB(pkB: Int, valueB: Int)


val rddA = sc.cassandraTable[TableA]("ks", "tableA")
rddA.joinWithCassandraTable[TableB]("ks", "tableB").where("tableB.valueB > tableA.valueA")
是否有方法发送
where(“tableB.valueB>tableA.valueA”)
指令?(“tableB.value”是一个集群列)

RDD.where()调用只是将谓词传递给CQL。CQL仅限于快速和简单的OLTP查询。 更复杂的查询只能用SparkSQL完成。对于您的情况,可能是这样的:

sqlContext.read.format("org.apache.spark.sql.cassandra")
    .options(Map( "table" -> "tableA", "keyspace"->"ks"))
    .load().registerTempTable("tableA")
sqlContext.read.format("org.apache.spark.sql.cassandra")
    .options(Map( "table" -> "tableB", "keyspace"->"ks"))
    .load().registerTempTable("tableB")
sqlContext.sql("select * from tableA join tableB on tableB.valueB > tableA.valueA").show