Scala 写文件需要很多时间
我正在写三个三元组的列表,大约277270行, 我的三个班级成员如下:Scala 写文件需要很多时间,scala,jena,filewriter,Scala,Jena,Filewriter,我正在写三个三元组的列表,大约277270行, 我的三个班级成员如下: class tripleInt (var sub:Int, var pre:Int, var obj:Int) 另外,我使用ApacheJena组件从RDF文件创建每个列表,将RDF元素转换为ID,并将这些ID存储在不同的列表中。一旦有了列表,我就用以下代码编写文件: class Indexes (val listSPO:List[tripleInt], val listPSO:List[tripleInt], val
class tripleInt (var sub:Int, var pre:Int, var obj:Int)
另外,我使用ApacheJena组件从RDF文件创建每个列表,将RDF元素转换为ID,并将这些ID存储在不同的列表中。一旦有了列表,我就用以下代码编写文件:
class Indexes (val listSPO:List[tripleInt], val listPSO:List[tripleInt], val listOSP:List[tripleInt] ){
val sl = listSPO.sortBy(l => (l.sub, l.pre))
val pl = listPSO.sortBy(l => (l.sub, l.pre))
//val ol = listOSP.sortBy(l => (l.sub, l.pre))
var y1:Int=0
var y2:Int=0
var y3:Int=0
val fstream:FileWriter = new FileWriter("patSPO.dat")
var out:BufferedWriter = new BufferedWriter(fstream)
//val fstream:FileOutputStream = new FileOutputStream("patSPO.dat")
//var out:ObjectOutputStream = new ObjectOutputStream(fstream)
//out.writeObject(listSPO)
val fstream2:FileWriter = new FileWriter("patPSO.dat")
var out2:BufferedWriter = new BufferedWriter(fstream2)
/*val fstream3:FileOutputStream = new FileOutputStream("patOSP.dat")
var out3:BufferedOutputStream = new BufferedOutputStream(fstream3)*/
for ( a <- 0 to sl.size-1){
y1 = sl(a).sub
y2 = sl(a).pre
y3 = sl(a).obj
out.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
}
for ( a <- 0 to pl.size-1){
y1 = pl(a).sub
y2 = pl(a).pre
y3 = pl(a).obj
out2.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
}
out.close()
out2.close()
类索引(val-listSPO:List[tripleInt],val-listPSO:List[tripleInt],val-listOSP:List[tripleInt]){
val sl=listSPO.sortBy(l=>(l.sub,l.pre))
val pl=listPSO.sortBy(l=>(l.sub,l.pre))
//val ol=listOSP.sortBy(l=>(l.sub,l.pre))
变量y1:Int=0
变量y2:Int=0
变量y3:Int=0
val fstream:FileWriter=newfilewriter(“patSPO.dat”)
var out:BufferedWriter=新的BufferedWriter(fstream)
//val fstream:FileOutputStream=新的FileOutputStream(“patSPO.dat”)
//var out:ObjectOutputStream=新的ObjectOutputStream(fstream)
//out.writeObject(listSPO)
val fstream2:FileWriter=newfilewriter(“patPSO.dat”)
var out2:BufferedWriter=新的BufferedWriter(fstream2)
/*val fstream3:FileOutputStream=新的FileOutputStream(“patOSP.dat”)
var out3:BufferedOutputStream=新的BufferedOutputStream(fstream3)*/
对于(a是的,你需要明智地选择你的数据结构。
List
用于顺序访问(Seq
),而不是随机访问(IndexedSeq
)。你所做的是O(n^2),因为索引大的List
s。以下应该快得多(O(n),希望也更容易阅读):
类索引(val-listSPO:List[tripleInt],val-listPSO:List[tripleInt],val-listOSP:List[tripleInt]){
val sl=listSPO.sortBy(l=>(l.sub,l.pre))
val pl=listPSO.sortBy(l=>(l.sub,l.pre))
变量y1:Int=0
变量y2:Int=0
变量y3:Int=0
val fstream:FileWriter=newfilewriter(“patSPO.dat”)
val out:BufferedWriter=新的BufferedWriter(fstream)
例如,您好,我不确定您是否看到了我的答案,但如果是:索引
Seq
是否是主要问题?修复它是否减少了执行时间?
class Indexes (val listSPO: List[tripleInt], val listPSO: List[tripleInt], val listOSP: List[tripleInt] ){
val sl = listSPO.sortBy(l => (l.sub, l.pre))
val pl = listPSO.sortBy(l => (l.sub, l.pre))
var y1:Int=0
var y2:Int=0
var y3:Int=0
val fstream:FileWriter = new FileWriter("patSPO.dat")
val out:BufferedWriter = new BufferedWriter(fstream)
for (s <- sl){
y1 = s.sub
y2 = s.pre
y3 = s.obj
out.write(s"$y1,$y2,$y3\n"))
}
// TODO close in finally
out.close()
val fstream2:FileWriter = new FileWriter("patPSO.dat")
val out2:BufferedWriter = new BufferedWriter(fstream2)
for ( p <- pl){
y1 = p.sub
y2 = p.pre
y3 = p.obj
out2.write(s"$y1,$y2,$y3\n"))
}
// TODO close in finally
out2.close()
}