Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 在Scala中,如何从由字符分隔的二进制文件中读取字节?_Java_Scala_Apache Spark_Inputstream_Apache Flink - Fatal编程技术网

Java 在Scala中,如何从由字符分隔的二进制文件中读取字节?

Java 在Scala中,如何从由字符分隔的二进制文件中读取字节?,java,scala,apache-spark,inputstream,apache-flink,Java,Scala,Apache Spark,Inputstream,Apache Flink,在Scala中,给定一个二进制文件,我感兴趣的是检索数组[字节]项的列表 例如,二进制文件中的项由字符/字节“my delimiter”分隔 如何获得每个项目的数组[字节]列表?功能解决方案,借助java.nio: import java.nio.file.{Files, Paths} object Main { private val delimiter = '\n'.toByte def main(args: Array[String]): Unit = { val b

在Scala中,给定一个二进制文件,我感兴趣的是检索数组[字节]项的列表

例如,二进制文件中的项由字符/字节“my delimiter”分隔


如何获得每个项目的数组[字节]列表?

功能解决方案,借助
java.nio

import java.nio.file.{Files, Paths}

object Main {

  private val delimiter = '\n'.toByte

  def main(args: Array[String]): Unit = {
    val byteArray = Files.readAllBytes(Paths.get(args(0)))

    case class Accumulator(result: List[List[Byte]], current: List[Byte])

    val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil)) {
      case (Accumulator(result, current), nextByte) =>
        if (nextByte == delimiter)
          Accumulator(current :: result, Nil)
        else
          Accumulator(result, nextByte :: current)
    } match {
      case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)
    }
    items.foreach(item => println(new String(item)))
  }

}
不过,该解决方案的性能预计会很差。这对你有多重要?您将读取多少个文件、大小和频率?如果性能很重要,那么您应该使用输入流和可变集合:

import java.io.{BufferedInputStream, FileInputStream}

import scala.collection.mutable.ArrayBuffer

object Main {

  private val delimiter = '\n'.toByte

  def main(args: Array[String]): Unit = {
    val items = ArrayBuffer.empty[Array[Byte]]
    val item = ArrayBuffer.empty[Byte]
    val bis = new BufferedInputStream(new FileInputStream(args(0)))
    var nextByte: Int = -1
    while ( { nextByte = bis.read(); nextByte } != -1) {
      if (nextByte == delimiter) {
        items.append(item.toArray)
        item.clear()
      } else {
        item.append(nextByte.toByte)
      }
    }
    items.append(item.toArray)
    items.foreach(item => println(new String(item)))
    bis.close()
  }

}

你检查过这个了吗?解决方案是否允许使用外部库?。是的,外部库是可用的。如果分隔符是多个字符的单个字节,如何扩展/修改此解决方案以实现此目的?