Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark Cassandra连接器未将所有记录添加到数据库_Apache Spark_Apache Zeppelin_Spark Cassandra Connector - Fatal编程技术网

Apache spark Spark Cassandra连接器未将所有记录添加到数据库

Apache spark Spark Cassandra连接器未将所有记录添加到数据库,apache-spark,apache-zeppelin,spark-cassandra-connector,Apache Spark,Apache Zeppelin,Spark Cassandra Connector,我使用的版本是:com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M3 我有一个卡夫卡流的RDD: kafkaStream.foreachRDD((rdd: RDD[String]) => { if(rdd.count > 0) { println(java.time.LocalDateTime.now + ". Consumed: " + rdd.count() + " messages."); s

我使用的版本是:com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M3

我有一个卡夫卡流的RDD:

kafkaStream.foreachRDD((rdd: RDD[String]) => {
  if(rdd.count > 0) {
    println(java.time.LocalDateTime.now + ". Consumed: " + rdd.count() + " messages.");

    sqlContext.read.json(rdd)
                .select("count_metadata.tran_id")
                .write
                .format("org.apache.spark.sql.cassandra")
                .options(Map("table" -> "tmp", "keyspace" -> "kspace"))
                .mode(SaveMode.Append)
                .save();
  } else {
      println(java.time.LocalDateTime.now + ". There are currently no messages on the topic that haven't been consumed.");
  }    
});
RDD计数约为40K,但spark connector仅使用一致的457条记录填充数据库

sqlContext.read.json(rdd).select("count_metadata.tran_id").count
还可以打印40k条记录

以下是我的表格声明:

cqlsh:kspace> CREATE TABLE tmp(tran_id text PRIMARY KEY);
每个消息的传输id都是唯一的

我错过了什么?为什么不是所有的4万张唱片都能登上那张桌子

我的日志也没有显示任何异常

每个消息的传输id都是唯一的

我撒谎说:

println(df.distinct.count);
印刷品

457
是时候把它带到我们的上游源头了