Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将每个分区存储到文件中,并将其加载到Scala Spark中的同一分区上_Scala_Apache Spark_Io_Filereader_Partition - Fatal编程技术网

将每个分区存储到文件中,并将其加载到Scala Spark中的同一分区上

将每个分区存储到文件中,并将其加载到Scala Spark中的同一分区上,scala,apache-spark,io,filereader,partition,Scala,Apache Spark,Io,Filereader,Partition,我遇到了这样的情况:必须将每个分区的数据存储到一个文件中,然后在同一分区加载存储的数据。这是我的密码 基类 case class foo ( posVals : Array[Double] , velVals : Array[Double] , f: Array[Double] => Double , fitnessVal: Double , LR1 : Double , PR1 : Double) extends Serializable { va

我遇到了这样的情况:必须将每个分区的数据存储到一个文件中,然后在同一分区加载存储的数据。这是我的密码

基类

case class foo ( posVals : Array[Double] , velVals : Array[Double] , f:  Array[Double] =>  Double ,
               fitnessVal: Double , LR1 : Double  , PR1 : Double) extends Serializable  {
 var position      : Array[Double]          =     posVals 
 var velocity      : Array[Double]          =     velVals 
 var fitness       : Double                 =     fitnessVal
 var PulseRate: Double = PR1
 var LoudnessRate: Double = LR1
}
目标函数

def sphere (ar : Array[Double]) : Double = ar.reduce((x,y) => x+y*y)
在每个分区内存储和读取数据

def execute(RDD: RDD[foo], c_itr: Int  ): Array[(foo, Int)] = {
val newRDD = RDD.mapPartitionsWithIndex {
  (index, Iterator) => {
    var arr: Array[foo] = Iterator.toArray
    if (c_itr != 0) {
      //Read Data from stored file where file name is equal to partition number (index)
      val bufferedSource = Source.fromFile("/result/"+index+".txt")
      val lines = bufferedSource.getLines()
      val data : Array[BAT1] = lines.flatMap{line =>
       val p = line.split(",")
       Seq( BAT1(p(0).toArray.map(_.toDouble) , p(1).toArray.map(_.toDouble) ,sphere ,line(2).toDouble, p(3).toDouble,    p(4).toDouble)  )
      }.toArray
     }

     arr = data.clone() // Replace arr with loaded data from file


      //Save to file
    val writer = new FileWriter(Path + index + ".txt")
    for (  i  <-  0  until  arr.length    ) {
      writer.write(arr(i).position.toList + "," + arr(i).velocity.toList + "," + arr(i).fitness + "," +
        arr(i).LoudnessRate + "," + arr(i).PulseRate + "\n")
    }
    writer.close()
    val bests : Array[(foo , Int)] = res1.map(x => (x, index))
    bests.toIterator
   }
 }
 newRDD.persist().collect()
}

从文件中读取数据时,此代码不会读取精确的数据。我试了很多,但找不到问题。如何正确读取数据对象中存储的数据?

您传入
执行
RDD
的值是多少?我理解类型为foot的RDD,但我询问其中存在的值
mapPartitionsWithIndex
将遍历RDD中存在的每个分区。我的问题是RDD中的数据是什么,您是从某处读取的吗?还是生成它?生成。请查看保存到文件注释后的第行。写入程序正在将数据写入文件并加载数据
 List(86.6582767815429, -25.224569272200586, 90.52371028878218, -59.91851894060545, -37.12944037124118),List(-59.60155033146984, -8.927455672466586, -23.679516503590534, 87.58857469881022 ,-14.864361504195127),6.840659702736215E10,0.6012,0.04131580765457621
 List(86.6582767815429, -25.224569272200586, 90.52371028878218, -59.91851894060545, -26.10553311409422),List(-66.83980088207335, 51.088426986986015, -109.74073303298485, 66.87095748811572, -22.941448024344268),9.195157603574039E10,0.9025,0.06132589765454988