Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala无限迭代器OutOfMemory_Scala_Functional Programming_Iterator_Lazy Evaluation - Fatal编程技术网

Scala无限迭代器OutOfMemory

Scala无限迭代器OutOfMemory,scala,functional-programming,iterator,lazy-evaluation,Scala,Functional Programming,Iterator,Lazy Evaluation,我在玩Scala的惰性迭代器,遇到了一个问题。我要做的是读取一个大文件,进行转换,然后写出结果: object FileProcessor { def main(args: Array[String]) { val inSource = Source.fromFile("in.txt") val outSource = new PrintWriter("out.txt") try { // this "basic" lazy iterator works

我在玩Scala的惰性迭代器,遇到了一个问题。我要做的是读取一个大文件,进行转换,然后写出结果:

object FileProcessor {
  def main(args: Array[String]) {
    val inSource = Source.fromFile("in.txt")
    val outSource = new PrintWriter("out.txt")

    try {
      // this "basic" lazy iterator works fine
      // val iterator = inSource.getLines

      // ...but this one, which incorporates my process method, 
      // throws OutOfMemoryExceptions
      val iterator = process(inSource.getLines.toSeq).iterator

      while(iterator.hasNext) outSource.println(iterator.next)

    } finally {
      inSource.close()
      outSource.close()
    }
  }

  // processing in this case just means upper-cases every line
  private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
}
所以我在大文件上遇到了OutOfMemoryException。我知道,如果您保留对流头的引用,您可能会与Scala的懒惰流发生冲突。因此,在本例中,我会小心地将process()的结果转换为迭代器,并丢弃它最初返回的Seq

有人知道为什么这仍然会导致O(n)内存消耗吗?谢谢


更新

作为对fge和huynhjl的回应,似乎Seq可能是罪魁祸首,但我不知道为什么。例如,下面的代码工作得很好(我到处都在使用Seq)。此代码不会生成OutOfMemoryException:

object FileReader {
  def main(args: Array[String]) {

  val inSource = Source.fromFile("in.txt")
  val outSource = new PrintWriter("out.txt")
  try {
    writeToFile(outSource, process(inSource.getLines.toSeq))
  } finally {
    inSource.close()
    outSource.close()
  }
}

@scala.annotation.tailrec
private def writeToFile(outSource: PrintWriter, contents: Seq[String]) {
  if (! contents.isEmpty) {
    outSource.println(contents.head)
    writeToFile(outSource, contents.tail)
  }
}

private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
如fge所示,修改
进程
以获取迭代器并删除
.toSeq
inSource.getLines
已经是迭代器

转换为
Seq
将导致项目被记住。我认为它会将迭代器转换为
,并使所有项都被记住

编辑:好的,它更微妙。通过对流程结果调用
Iterator
,您正在执行与
Iterator.toSeq.Iterator
等效的操作。这可能会导致内存不足异常

scala> Iterator.continually(1).toSeq.iterator.take(300*1024*1024).size
java.lang.OutOfMemoryError: Java heap space

可能与此处报告的问题相同:。请注意我在bug末尾的评论,这是个人经验

胡乱猜测:
.getLines.toSeq