Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 从表中读取使用spark phoenix到rdd分区号为1_Scala_Apache Spark_Pyspark_Rdd - Fatal编程技术网

Scala 从表中读取使用spark phoenix到rdd分区号为1

Scala 从表中读取使用spark phoenix到rdd分区号为1,scala,apache-spark,pyspark,rdd,Scala,Apache Spark,Pyspark,Rdd,当我运行spark代码时: val sqlContext = spark.sqlContext val noact_table = primaryDataProcessor.getTableData(sqlContext, zookeeper, tableName) println("noact_table.rdd:"+noact_table.rdd.partitions.size) val tmp = noact_table.rdd println(tm

当我运行spark代码时:

    val sqlContext = spark.sqlContext
    val noact_table = primaryDataProcessor.getTableData(sqlContext, zookeeper, tableName)
    println("noact_table.rdd:"+noact_table.rdd.partitions.size)
    val tmp = noact_table.rdd
    println(tmp.partitions.size)
    val out = tmp.map(x => x(0) + "," + x(1))
    HdfsOperator.writeHdfsFile(out, "/tmp/test/push")
getTableData:

 def getTableData(sqlContext: SQLContext, zkUrl: String, tableName: String): DataFrame = {
    val tableData = sqlContext.read.format("org.apache.phoenix.spark")
      .option("table", tableName)
      .option("zkUrl", zkUrl).load()
    tableData
  }
我的问题是这个表大约有2000行数据,但我的分区结果是1

然后我继续:

val push_res = cookieRdd.keyBy(_._2._2).join(tmp).map(x => (x._2._1._1, x._1, x._2._2._2, x._2._2._3, x._2._2._4, x._2._2._5, nexthour))
我的
cookierdd
分区是96,而
tmp
分区号是1 然后,
push\u res
分区的数目是1。 什么时候能解释为什么会发生这种情况?为什么
tmp
分区和
push_res
分区都是1