Java 读写_Java_Sorting_String - Fatal编程技术网

Java 读写

java sorting string

Java 读写,java,sorting,string,Java,Sorting,String,我用Java编写了一个程序，从图像中读取一些元数据的文本文件。它们包含姓名和一长串姓名，有时超过4000个。不幸的是，这些名称中有许多是相同的，因此我编写了一个程序，在.txt文件中获取列表，并消除重复项，然后将新清理的按字母顺序排序的列表输出到输出txt文件此外，该程序将HTML列表标记添加到每个名称中，以便我可以在需要时复制粘贴它们示例文本文件： Chatty Little Kitty Chatty Little Kitty Bearly Nuf Taz Got Lil Pepto 但

我用Java编写了一个程序，从图像中读取一些元数据的文本文件。它们包含姓名和一长串姓名，有时超过4000个。不幸的是，这些名称中有许多是相同的，因此我编写了一个程序，在

.txt

文件中获取列表，并消除重复项，然后将新清理的按字母顺序排序的列表输出到输出txt文件

此外，该程序将HTML列表标记添加到每个名称中，以便我可以在需要时复制粘贴它们

示例文本文件：

Chatty Little Kitty
Chatty Little Kitty
Bearly Nuf Taz
Got Lil Pepto

但是，它似乎不能正常工作，因为我的输出文件中仍然有重复项。然而，我写的代码对我来说似乎是正确的，这就是为什么我问我如何设置读写是否有问题

我的代码：

 * This program takes in a text file that has a bunch of words listed. It then creates a single alphabetically
 * organized html list from that data. It also strips the data of dupblicates.
 */

import java.io.*;
import java.util.Arrays;

public class readItWriteIt
{   
       public static void main(String args[])
      {
        int MAX = 10000;
        String[] lines = new String[MAX];
        boolean valid = true;

        try{
        //Set up Input
        FileInputStream fstream = new FileInputStream("test.txt");
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;


        //Set up Output
        FileWriter ostream = new FileWriter("out.txt");
        BufferedWriter out = new BufferedWriter(ostream);

        //counters
        int count = 0;
        int second_count = 0;

        //start reading in lines from the file
        while ((strLine = br.readLine()) != null){   

        //check to make sure that there aren't duplicates. If a line is the same as another line 
        //set boolean valid to false else set to true.
        if((second_count++ > 0) && (count > 0)){
            for(int i=0; i < count; i++)
            {
                if(lines[i].equals(strLine)){
                    valid = false;
                }
                else
                {
                    valid = true;
                }
            }
        }


        //only copy the line to the local array if it is not a duplicate. Else do nothing with it.  
            if (valid == true){
                lines[count] = strLine.trim();
                count++;
            }
            else{}
          second_count++;
        }

        //create a second array so that you can get rid of all the null values. It is the size of the 
        //used length in the first array called "lines"
        String[] newlines = new String[count];

        //copy data from array lines to array called newlines
        for(int i = 0; i < count; i++){ 
            newlines[i] = lines[i];
        }

        //sort the array alphabetically
        Arrays.sort(newlines);

        //write it out to file in alphabetical order along with the list syntax for html
        for(int i = 0; i < count; i++)
        {
            out.write("<li>" + newlines[i] + "</li>");
            out.newLine();
        }

        //close I/O
        in.close();
        out.close();

        }catch (Exception e){//Catch exception if any
          System.err.println("Error: " + e.getMessage());
        }
      }
}

*此程序接收一个文本文件，其中列出了一堆单词。然后按字母顺序创建一个
*根据该数据组织html列表。它还剥离了重复数据。
*/
导入java.io.*；
导入java.util.array；
公共类readItWriteIt
{   
公共静态void main（字符串参数[]）
{
int MAX=10000；
字符串[]行=新字符串[MAX]；
布尔有效=真；
试一试{
//设置输入
FileInputStream fstream=新的FileInputStream（“test.txt”）；
DataInputStream in=新的DataInputStream（fstream）；
BufferedReader br=新的BufferedReader（新的InputStreamReader（in））；
弦斯特林；
//设置输出
FileWriter ostream=newfilewriter（“out.txt”）；
BufferedWriter out=新的BufferedWriter（ostream）；
//计数器
整数计数=0；
整数秒计数=0；
//开始读取文件中的行
而（（strLine=br.readLine（））！=null）{
//检查以确保没有重复。如果一行与另一行相同
//将布尔有效值设置为false，否则设置为true。
如果（（秒计数+++>0）和（&（计数>0））{
for（int i=0；i”+换行符[i]+“”）；
out.newLine（）；
}
//关闭I/O
in.close（）；
out.close（）；
}catch（异常e）{//catch异常（如果有）
System.err.println（“错误：+e.getMessage（））；
}
}
}

我是这样写的

import java.util.HashSet;
import java.util.Set;
import java.io.*;
import java.util.Arrays;

public class converter {
    public static void main(String[] args) {

    try{
        //Set up Input
        FileInputStream fstream = new FileInputStream("test.txt");
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;

        //Set up Output
        FileWriter ostream = new FileWriter("out.txt");
        BufferedWriter out = new BufferedWriter(ostream);

        Set lines = new HashSet();
        boolean result;

        while ((strLine = br.readLine()) != null){   
          result = lines.add(strLine.trim());
        }
        String[] newlines = new String[lines.size()];
        lines.toArray(newlines);

        Arrays.sort(newlines);

        //write it out to file in alphabetical order along with the list syntax for html
        for(int i = 0; i < lines.size(); i++)
        {
            out.write("<li>" + newlines[i] + "</li>");
            out.newLine();
        }

        out.close();
        in.close();

       }catch (Exception e){//Catch exception if any
                System.err.println("Error: " + e.getMessage());
       }
    }
}

import java.util.HashSet；
导入java.util.Set；
导入java.io.*；
导入java.util.array；
公共类转换器{
公共静态void main（字符串[]args）{
试一试{
//设置输入
FileInputStream fstream=新的FileInputStream（“test.txt”）；
DataInputStream in=新的DataInputStream（fstream）；
BufferedReader br=新的BufferedReader（新的InputStreamReader（in））；
弦斯特林；
//设置输出
FileWriter ostream=newfilewriter（“out.txt”）；
BufferedWriter out=新的BufferedWriter（ostream）；
Set line=new HashSet（）；
布尔结果；
而（（strLine=br.readLine（））！=null）{
结果=lines.add（strLine.trim（））；
}
String[]newlines=新字符串[lines.size（）]；
行。toArray（换行符）；
数组。排序（换行符）；
//按字母顺序将其与html的列表语法一起写入文件
对于（int i=0；i”+换行符[i]+“”）；
out.newLine（）；
}
out.close（）；
in.close（）；
}catch（异常e）{//catch异常（如果有）
System.err.println（“错误：+e.getMessage（））；
}
}
}

但多亏了ewernli，它现在的效率要高得多。

如果将行添加到集合（作为键）而不是数组中，您将发现不需要进行任何重复处理。它将为您解决问题，您的程序将更简单、更短。

实际上，您的代码需要一些改进，但对我来说，最大的错误是，在使用提取行的修剪字符串将其放入行数组时，与未修剪字符串进行比较

lines[i].equals(strLine) // instead use "lines[i].equals(strLine.trim())"

数组不是您想要的数据结构（是否需要具有固定长度和顺序但具有可变元素的数据结构？）。查看中的集合类型。特别是，看看下面的实现。这将：

展开以保存数据

消除重复项（这是一个

集合

）

在添加内容时对其进行排序（请参见

Comparator

implementationlike）

回复：编辑。请注意，您也可以使用

TreeSet

而不是

HashSet

来完全剪切字符串数组-您可以使用

for（string line:lines）{

在

集

（或任何

Iterable

）上迭代（或任何

Iterable

）。为此，您需要使用泛型声明：

Set line=new TreeSet（）；