不同的HashMap大小（Java）_Java

不同的HashMap大小（Java）

java

不同的HashMap大小（Java）,java,Java,我正在加载各种不同长度的文本文件，并将它们添加到名为“collection”的HashMap中 List<String> textFileList = Arrays.asList("ArsenalNoStopWords.txt", "ChelseaNoStopWords.txt", "LiverpoolNoStopWords.txt", "ManchesterUnitedNoStopWords.txt", "ManchesterCityNoStopWords.

我正在加载各种不同长度的文本文件，并将它们添加到名为“collection”的HashMap中

List<String> textFileList = Arrays.asList("ArsenalNoStopWords.txt", "ChelseaNoStopWords.txt", "LiverpoolNoStopWords.txt",
            "ManchesterUnitedNoStopWords.txt", "ManchesterCityNoStopWords.txt", "TottenhamNoStopWords.txt");

for (String text : textFileList) {
        scanFile(text);
    }

public static void scanFile(String textFileName) {
    try {

        Scanner textFile = new Scanner(new File(textFileName));

        while (textFile.hasNext()) {
             collection.put(textFile.next().trim(), 0);
        }

        textFile.close();

    } catch (FileNotFoundException e) {
         e.printStackTrace();
    }
}

List textFileList=Arrays.asList（“ArsenalNoStopWords.txt”、“ChelseaNoStopWords.txt”、“LiverpoolNoStopWords.txt”，
“ManchesterUnitedNoStopWords.txt”、“ManchesterCityNoStopWords.txt”、“TottenhamNoStopWords.txt”）；
for（字符串文本：textFileList）{
扫描文件（文本）；
}
公共静态void扫描文件（字符串textFileName）{
试一试{
扫描仪textFile=新扫描仪（新文件（textFileName））；
while（textFile.hasNext（））{
collection.put（textFile.next（）.trim（），0）；
}
textFile.close（）；
}catch（filenotfounde异常）{
e、 printStackTrace（）；
}
}

在此之后，我将加载其中一个文档，并使用HashMap（集合）计算其词频

ArrayList document=new ArrayList（）；
document=processDocument（“TottenhamNoStopWords.txt”）；
私有静态ArrayList processDocument（字符串填充名）引发IOException{
对于（Map.Entry:collection.entrySet（））{
entry.setValue（0）；
}
扫描仪文本文件=新扫描仪（新文件（填充名））；
ArrayList文件=新的ArrayList（）；
while（textFile.hasNext（））{
add（textFile.next（）.trim（）.toLowerCase（））；
}
for（字符串字：文件）{
整数dict=collection.get（word）；
如果（！collection.containsKey（word））{
集合。放置（单词，1）；
}否则{
集合。放置（单词，dict+1）；
}
}
textFile.close（）；
ArrayList values=新的ArrayList（collection.values（））；
返回值；
}

接下来，我将把processDocument（）中的变量值输出到一个文本文件中——我有六个变量，它们的名称都不同。理论上，每个团队的集合的每个版本都应该具有相同的长度，因为集合的键永远不会更改，并且始终来自textFileList列表-唯一更改的变量是要处理的文档。但是为什么我的向量（ArrayList）长度很长，而它们应该是相同的大小，但频率值不同呢？

在第一步中，您使用

textFile.next（）.trim（）

在第二部分中使用

file.add（textFile.next（）.trim（）.toLowerCase（））

，您的集合中有大小写重复的值

啊，我明白了。因此，这就是在

textFile.next（）.trim（）

中添加

toLowerCase（）

方法的问题，或者从

file.add（textFile.next（）.toLowerCase（））

中删除

toLowerCase（）

方法以避免任何重复？@FeelingLikeAJabroni是的。

ArrayList<Integer> document = new ArrayList<Integer>();

document = processDocument("TottenhamNoStopWords.txt");

private static ArrayList<Integer> processDocument(String inFileName) throws IOException {

    for (Map.Entry<String, Integer> entry : collection.entrySet()) {
        entry.setValue(0);
    }

    Scanner textFile = new Scanner(new File(inFileName));
    ArrayList<String> file = new ArrayList<String>();

    while(textFile.hasNext()) {
        file.add(textFile.next().trim().toLowerCase());
    }

    for(String word : file) {
        Integer dict = collection.get(word);
        if (!collection.containsKey(word)) {
            collection.put(word, 1); 
        } else {
            collection.put(word, dict + 1);
        }
    }

    textFile.close();

    ArrayList<Integer> values = new ArrayList<>(collection.values());
    return values;  
}