Java 读取文本文件并使用集合和列表删除单词_Java

Java 读取文本文件并使用集合和列表删除单词

java

Java 读取文本文件并使用集合和列表删除单词,java,Java,我正在构建一个程序，读取一个包含停止词的文本文件，然后读取一个从Twitter收集的推文文本文件。我试图从tweet集合中删除停止词，这样我就只剩下“有趣的”词汇表，然后它将它们打印到控制台但是，没有任何东西打印到控制台，所以很明显它不工作。。。它在导入test.txt文件之前工作（当我使用程序中创建的字符串时，将其拆分，然后将其存储在数组中）阅读test.txt文件并拉出停止词，然后将listOfWords列表打印到控制台的任何帮助任何帮助都将不胜感激 import java.util.

我正在构建一个程序，读取一个包含停止词的文本文件，然后读取一个从Twitter收集的推文文本文件。我试图从tweet集合中删除停止词，这样我就只剩下“有趣的”词汇表，然后它将它们打印到控制台

但是，没有任何东西打印到控制台，所以很明显它不工作。。。它在导入test.txt文件之前工作（当我使用程序中创建的字符串时，将其拆分，然后将其存储在数组中）

阅读test.txt文件并拉出停止词，然后将listOfWords列表打印到控制台的任何帮助

任何帮助都将不胜感激

import java.util.*;
import java.io.*;

public class RemoveStopWords {

  public static void main(String[] args) {

    try {
    Scanner stopWordsFile = new Scanner(new File("stopwords_twitter.txt"));
    Scanner textFile = new Scanner(new File("Test.txt"));

    // Create a set for the stop words (a set as it doesn't allow duplicates)
    Set<String> stopWords = new HashSet<String>();
    // For each word in the file
    while (stopWordsFile.hasNext()) {
        stopWords.add(stopWordsFile.next().trim().toLowerCase());
    }

    // Splits strings and stores each word into a list
    ArrayList<String> words = new ArrayList<String>();
    while (stopWordsFile.hasNext()) {
        words.add(textFile.next().trim().toLowerCase());
    }

    // Create an empty list (a list because it allows duplicates) 
    ArrayList<String> listOfWords = new ArrayList<String>();

    // Iterate over the array 
    for(String word : words) {
        // Converts current string index to lowercase
        String toCompare = word.toLowerCase();
        // If the word isn't a stop word, add to listOfWords list
        if (!stopWords.contains(toCompare)) {
            listOfWords.add(word);
        }
    }

    stopWordsFile.close();
    textFile.close();

    for (String str : listOfWords) {
        System.out.print(str + " ");
    }
    } catch(FileNotFoundException e){
        e.printStackTrace();
    }
}
}

import java.util.*；
导入java.io.*；
公共类删除词{
公共静态void main（字符串[]args）{
试一试{
Scanner stopWordsFile=新扫描仪（新文件（“stopwords_twitter.txt”）；
扫描仪文本文件=新扫描仪（新文件（“Test.txt”）；
//为停止字创建一个集合（一个集合，因为它不允许重复）
Set stopWords=new HashSet（）；
//对于文件中的每个单词
while（stopWordsFile.hasNext（））{
添加（stopWordsFile.next（）.trim（）.toLowerCase（））；
}
//拆分字符串并将每个单词存储到列表中
ArrayList words=新的ArrayList（）；
while（stopWordsFile.hasNext（））{
words.add（textFile.next（）.trim（）.toLowerCase（））；
}
//创建一个空列表（一个允许重复的列表）
ArrayList ListofWord=新的ArrayList（）；
//迭代数组
for（字符串字：字）{
//将当前字符串索引转换为小写
字符串toCompare=word.toLowerCase（）；
//如果单词不是停止词，请添加到listOfWords列表
如果（！stopWords.contains（toCompare））{
添加（单词）；
}
}
stopWordsFile.close（）；
textFile.close（）；
for（字符串str:listOfWords）{
系统输出打印（str+“”）；
}
}catch（filenotfounde异常）{
e、 printStackTrace（）；
}
}
}

您有两个

，而（stopWordsFile.hasNext（））

，第二个将始终返回

false

：

// For each word in the file
while (stopWordsFile.hasNext()) {
    stopWords.add(stopWordsFile.next().trim().toLowerCase());
}

// Splits strings and stores each word into a list
ArrayList<String> words = new ArrayList<String>();
while (stopWordsFile.hasNext()) {
    words.add(textFile.next().trim().toLowerCase());
}

反而

while (stopWordsFile.hasNext())

在第二个文件中。

问题是您正在从文件中读取两次单词：

while (stopWordsFile.hasNext()) { // this will never execute as stopWordsFile has no nextElement left
        words.add(textFile.next().trim().toLowerCase());
}

因此，将第二个while条件更改为：

while (textFile.hasNext()) { 
    words.add(textFile.next().trim().toLowerCase());
}

通过逐行读取将文件复制到另一个文件中，并在每次迭代（每行）测试时，如果有一行包含“stopword”，则从该行中删除该行，然后复制文件中的该行，否则按原样复制该行

while (textFile.hasNext()) { 
    words.add(textFile.next().trim().toLowerCase());
}