如何找到Java中两个列表的相似性？_Java_List_Collections_Comparison

如何找到Java中两个列表的相似性？

java list collections

如何找到Java中两个列表的相似性？,java,list,collections,comparison,Java,List,Collections,Comparison,对于家庭作业，我们将把basicCompare方法转化为比较两个文本文档的方法，看看它们是否涉及相似的主题。基本上，该程序将删除所有长度小于5个字符的单词，并给我们留下列表。我们应该比较列表，如果两个文档之间的单词使用得足够多（比如说80%的相似性），那么该方法将返回true并显示“匹配” 然而，我被困在了方法底部的所有注释的正确位置。我想不出或找不到一种方法来比较这两个列表，找出两个列表中单词的百分比。也许我想错了，需要过滤掉两个列表中都没有的单词，然后数一数还有多少单词。定义输入文档是否匹配

对于家庭作业，我们将把

basicCompare

方法转化为比较两个文本文档的方法，看看它们是否涉及相似的主题。基本上，该程序将删除所有长度小于5个字符的单词，并给我们留下列表。我们应该比较列表，如果两个文档之间的单词使用得足够多（比如说80%的相似性），那么该方法将返回true并显示“匹配”

然而，我被困在了方法底部的所有注释的正确位置。我想不出或找不到一种方法来比较这两个列表，找出两个列表中单词的百分比。也许我想错了，需要过滤掉两个列表中都没有的单词，然后数一数还有多少单词。定义输入文档是否匹配的参数完全由我们决定，因此可以根据需要进行设置。如果你们善良的女士们先生们能给我指出正确的方向，即使是某个函数上的Java文档页面，我相信我能完成剩下的工作。我只是想知道从哪里开始

import java.util.Collections;
import java.util.List;

public class MyComparator implements DocumentComparator {

        public static void main(String args[]){
                MyComparator mc = new MyComparator();

if(mc.basicCompare("C:\\Users\\Quinncuatro\\Desktop\\MatchLabJava\\LabCode\\match1.txt", "C:\\Users\\Quinncuatro\\Desktop\\MatchLabJava\\LabCode\\match2.txt")){
                    System.out.println("match1.txt and match2.txt are similar!");
            } else {
                    System.out.println("match1.txt and match2.txt are NOT similar!");
            }
    }

    //In the basicCompare method, since the bottom returns false, it results in the else statement in the calling above, saying they're not similar
    //Need to implement a thing that if so many of the words are shared, it returns as true

    public boolean basicCompare(String f1, String f2) {
            List<String> wordsFromFirstArticle = LabUtils.getWordsFromFile(f1);
            List<String> wordsFromSecondArticle = LabUtils.getWordsFromFile(f2);

            Collections.sort(wordsFromFirstArticle);
            Collections.sort(wordsFromSecondArticle);//sort list alphabetically

            for(String word : wordsFromFirstArticle){
                    System.out.println(word);
            }

            for(String word2 : wordsFromSecondArticle){
                    System.out.println(word2);
            }

            //Find a way to use common_words to strip out the "noise" in the two lists, so you're ONLY left with unique words
            //Get rid of words not in both lists, if above a certain number, return true
            //If word1 = word2 more than 80%, return true

            //Then just write more whatever.basicCompare modules to compare 2 to 3, 1 to 3, 1 to no, 2 to no, and 3 to no

            //Once you get it working, you don't need to print the words, just say whether or not they "match"

            return false;

    }


    public boolean mapCompare(String f1, String f2) {

            return false;
    }

import java.util.Collections；
导入java.util.List；
公共类MyComparator实现DocumentComparator{
公共静态void main（字符串参数[]）{
MyComparator mc=新的MyComparator（）；
如果（mc.basicCompare（“C:\\Users\\Quinncuatro\\Desktop\\MatchLabJava\\LabCode\\match1.txt”，“C:\\Users\\Quinncuatro\\Desktop\\MatchLabJava\\LabCode\\match2.txt”））{
System.out.println（“match1.txt和match2.txt相似！”）；
}否则{
System.out.println（“match1.txt和match2.txt不相似！”）；
}
}
//在basicCompare方法中，由于底部返回false，因此会在上面的调用中产生else语句，表示它们不相似
//需要实现一个东西，如果共享了这么多的单词，它将返回为true
公共布尔基本比较（字符串f1、字符串f2）{
List wordsFromFirstArticle=LabUtils.getWordsFromFile（f1）；
List wordsFromSecondArticle=LabUtils.getWordsFromFile（f2）；
Collections.sort（wordsFromFirstArticle）；
Collections.sort（wordsFromSecondArticle）；//按字母顺序对列表排序
for（字符串字：第一篇文章中的字）{
System.out.println（word）；
}
for（字符串word2:wordsFromSecondArticle）{
System.out.println（word2）；
}
//找到一种方法，用常用词去除两个列表中的“噪音”，这样你就只剩下唯一的词了
//去掉不在两个列表中的单词，如果超过某个数字，则返回true
//如果word1=word2超过80%，则返回true
//然后只需编写更多的whatever.basicCompare模块来比较2对3、1对3、1对否、2对否、3对否
//一旦你开始工作，你不需要打印单词，只需说出它们是否“匹配”
返回false；
}
公共布尔映射比较（字符串f1、字符串f2）{
返回false；
}

}

试着通过在纸上或头脑中执行这些步骤来想出一个算法。一旦你明白你需要做什么，把它翻译成代码。这就是所有算法的发明方式。

首先将列表设置为删除重复项

迭代其中一个集合，并使用contains方法检查另一个集合是否包含相同的单词

int count = 0;
Set<String> set1 = new HashSet<String>(LabUtils.getWordsFromFile(f1));
Set<String> set2 = new HashSet<String>(LabUtils.getWordsFromFile(f2));

Iterator<String> it = set1.iterator();

while (it.hasNext()){
    String s = it.next();

    if (set2.contains(s)){
        count++;
    }

}

int count=0；
Set set1=新的HashSet（LabUtils.getWordsFromFile（f1））；
Set set2=新的HashSet（LabUtils.getWordsFromFile（f2））；
迭代器it=set1.Iterator（）；
while（it.hasNext（））{
字符串s=it.next（）；
如果（集合2.包含）{
计数++；
}
}

然后使用计数器计算百分比（计数/总数）*100。如果大于80%，则返回true，否则返回false

理解列表、集合和队列之间的区别总是很好的。我希望这为你指明了正确的方向

我明白，Sky Kelsey，这只是一个学习Java的问题，不知道我想做的事情是否可以翻译成干净的代码。还有其他数据结构，是否有可能另一个数据结构不允许重复，并使您的工作更轻松？2）有一个接口java.util.Collection可以列出列表，并实现其他“集合”。查看该接口中声明的常用方法。其中一个叫做“contains”，也可能对您有所帮助。虽然您已经展示了代码，但最好是您能够展示您在问题的主要方面所做的努力，而不仅仅是提供围绕问题的框架代码。