在Java中解析具有固定格式的文本文件_Java_String_Text_Text Parsing

在Java中解析具有固定格式的文本文件

java string text

在Java中解析具有固定格式的文本文件,java,string,text,text-parsing,Java,String,Text,Text Parsing,假设我知道一种文本文件格式例如，每行包含如下4个字段： firstword secondword thirdword fourthword firstword2 secondword2 thirdword2 fourthword2 ... // Assuming a Reader called in and a Set called mySet String line = in.readLine(); while(line != null) { String[] splat = line

假设我知道一种文本文件格式

例如，每行包含如下4个字段：

firstword secondword thirdword fourthword firstword2 secondword2 thirdword2 fourthword2 ...

// Assuming a Reader called in and a Set called mySet

String line = in.readLine();
while(line != null)
{
  String[] splat = line.split(" ");
  mySet.add(new Widget(splat[0], splat[1], splat[2], splat[3]));
  line = in.readLine();
}

第一个字第二个字第三个字第四个字第一个字第二个字第三个字第四个字 ... 我需要把它完全读入记忆

我可以使用这种方法：

open a text file while not EOF read line by line split each line by a space create a new object with four fields extracted from each line add this object to a Set 打开一个文本文件而不是EOF 逐行阅读将每行分割一个空格创建一个新对象，从每行提取四个字段将此对象添加到集合中好的，但是还有更好的吗，一个特殊的第三方Java库

因此，我们可以预先定义每个文本行的结构，并使用一些函数解析文件

thirdpartylib.setInputTextFileFormat("format.xml"); thirdpartylib.parse(Set, "pathToFile") setInputTextFileFormat（“format.xml”）； thirdpartylib.parse（设置为“pathToFile”）

？

如果您明确知道分隔符是什么，那么您建议的aproach将是快速可靠的，并且代码开销非常小。第三方库（google“java文本文件库”是一个很长的列表）的好处是，它可能有一堆代码来处理作者关心的奇怪情况。缺点是，如果您正在处理一种简单可靠的文本文件格式，那么它可能会比您需要的代码更多

这样做的好处是，您可以根据自己的需求调整代码，包括可伸缩性问题，如果您有大量数据，这可能是一个需要考虑的问题。通常，第三方库会对文件进行完整读取，如果您有数百万行，这可能是不实际的

我的建议是花一个小时左右的时间写你自己的，看看你能得到什么。你可以用很少的努力来破解它。如果发现您有一个复杂的问题需要解决，需要解决数据格式方面的各种特殊问题，那么就开始寻找一个库。

您可以这样做：

firstword secondword thirdword fourthword firstword2 secondword2 thirdword2 fourthword2 ...

// Assuming a Reader called in and a Set called mySet

String line = in.readLine();
while(line != null)
{
  String[] splat = line.split(" ");
  mySet.add(new Widget(splat[0], splat[1], splat[2], splat[3]));
  line = in.readLine();
}

但你真的需要更好地定义“更好”的含义。上述方法在处理“坏”输入时不会表现良好，但会非常快（这实际上取决于集合的实现。如果不断调整其大小，可能会导致性能损失）

使用XML和定义模式将允许您在解析之前验证输入，并且可能会简化对象创建，但您不能在每行上只包含四个字符串（您将需要XML标记等）。请参阅第三方库示例。

我完全同意Catchwa的观点，您所说的更好是什么意思？您的算法非常清晰，易于阅读/维护。你还期望什么？可扩展性？速度磁盘访问次数？