Scala 写文件需要很多时间

Scala 写文件需要很多时间,scala,jena,filewriter,Scala,Jena,Filewriter,我正在写三个三元组的列表,大约277270行, 我的三个班级成员如下: class tripleInt (var sub:Int, var pre:Int, var obj:Int) 另外,我使用ApacheJena组件从RDF文件创建每个列表,将RDF元素转换为ID,并将这些ID存储在不同的列表中。一旦有了列表,我就用以下代码编写文件: class Indexes (val listSPO:List[tripleInt], val listPSO:List[tripleInt], val

我正在写三个三元组的列表,大约277270行, 我的三个班级成员如下:

class tripleInt  (var sub:Int, var pre:Int, var obj:Int)
另外,我使用ApacheJena组件从RDF文件创建每个列表,将RDF元素转换为ID,并将这些ID存储在不同的列表中。一旦有了列表,我就用以下代码编写文件:

class Indexes (val listSPO:List[tripleInt], val listPSO:List[tripleInt], val listOSP:List[tripleInt] ){
  val sl = listSPO.sortBy(l => (l.sub, l.pre))
  val pl = listPSO.sortBy(l => (l.sub, l.pre))
  //val ol = listOSP.sortBy(l => (l.sub, l.pre))

  var y1:Int=0
  var y2:Int=0
  var y3:Int=0

  val fstream:FileWriter = new FileWriter("patSPO.dat")
  var out:BufferedWriter = new BufferedWriter(fstream)
  //val fstream:FileOutputStream = new FileOutputStream("patSPO.dat")
  //var out:ObjectOutputStream = new ObjectOutputStream(fstream)
  //out.writeObject(listSPO)
  val fstream2:FileWriter = new FileWriter("patPSO.dat")
  var out2:BufferedWriter = new BufferedWriter(fstream2)
  /*val fstream3:FileOutputStream = new FileOutputStream("patOSP.dat")
  var out3:BufferedOutputStream = new BufferedOutputStream(fstream3)*/

  for ( a <- 0 to sl.size-1){
    y1 = sl(a).sub
    y2 = sl(a).pre
    y3 = sl(a).obj
    out.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
  }
  for ( a <- 0 to pl.size-1){
    y1 = pl(a).sub
    y2 = pl(a).pre
    y3 = pl(a).obj
    out2.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
  }
  out.close()
  out2.close()
类索引(val-listSPO:List[tripleInt],val-listPSO:List[tripleInt],val-listOSP:List[tripleInt]){ val sl=listSPO.sortBy(l=>(l.sub,l.pre)) val pl=listPSO.sortBy(l=>(l.sub,l.pre)) //val ol=listOSP.sortBy(l=>(l.sub,l.pre)) 变量y1:Int=0 变量y2:Int=0 变量y3:Int=0 val fstream:FileWriter=newfilewriter(“patSPO.dat”) var out:BufferedWriter=新的BufferedWriter(fstream) //val fstream:FileOutputStream=新的FileOutputStream(“patSPO.dat”) //var out:ObjectOutputStream=新的ObjectOutputStream(fstream) //out.writeObject(listSPO) val fstream2:FileWriter=newfilewriter(“patPSO.dat”) var out2:BufferedWriter=新的BufferedWriter(fstream2) /*val fstream3:FileOutputStream=新的FileOutputStream(“patOSP.dat”) var out3:BufferedOutputStream=新的BufferedOutputStream(fstream3)*/
对于(a是的,你需要明智地选择你的数据结构。
List
用于顺序访问(
Seq
),而不是随机访问(
IndexedSeq
)。你所做的是O(n^2),因为索引大的
List
s。以下应该快得多(O(n),希望也更容易阅读):

类索引(val-listSPO:List[tripleInt],val-listPSO:List[tripleInt],val-listOSP:List[tripleInt]){ val sl=listSPO.sortBy(l=>(l.sub,l.pre)) val pl=listPSO.sortBy(l=>(l.sub,l.pre)) 变量y1:Int=0 变量y2:Int=0 变量y3:Int=0 val fstream:FileWriter=newfilewriter(“patSPO.dat”) val out:BufferedWriter=新的BufferedWriter(fstream)
例如,您好,我不确定您是否看到了我的答案,但如果是:索引
Seq
是否是主要问题?修复它是否减少了执行时间?
class Indexes (val listSPO: List[tripleInt], val listPSO: List[tripleInt], val listOSP: List[tripleInt] ){
  val sl = listSPO.sortBy(l => (l.sub, l.pre))
  val pl = listPSO.sortBy(l => (l.sub, l.pre))

  var y1:Int=0
  var y2:Int=0
  var y3:Int=0

  val fstream:FileWriter = new FileWriter("patSPO.dat")
  val out:BufferedWriter = new BufferedWriter(fstream)

  for (s <- sl){
    y1 = s.sub
    y2 = s.pre
    y3 = s.obj
    out.write(s"$y1,$y2,$y3\n"))
  }
  // TODO close in finally
  out.close()

  val fstream2:FileWriter = new FileWriter("patPSO.dat")
  val out2:BufferedWriter = new BufferedWriter(fstream2)

  for ( p <- pl){
    y1 = p.sub
    y2 = p.pre
    y3 = p.obj
    out2.write(s"$y1,$y2,$y3\n"))
  }
  // TODO close in finally
  out2.close()
}