Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将Jaccard距离实现到ANTLR中以查找java代码的相似性_Java - Fatal编程技术网

将Jaccard距离实现到ANTLR中以查找java代码的相似性

将Jaccard距离实现到ANTLR中以查找java代码的相似性,java,Java,过了一会儿,我成功地使用ANTLR从一个文件.java中获得了一个唯一的id。然后我用N-gram将这个唯一的id除以4-gram,这要感谢ANTLR。这是我的代码: public void runAlgoritma(File mainFile, List<String> fileJlist) BufferedReader in = null; try { in = new BufferedReader(new FileReader(FileUtama.getAbsol

过了一会儿,我成功地使用ANTLR从一个文件.java中获得了一个唯一的id。然后我用N-gram将这个唯一的id除以4-gram,这要感谢ANTLR。这是我的代码:

public void runAlgoritma(File mainFile, List<String> fileJlist)
 BufferedReader in = null;
 try {
     in = new BufferedReader(new FileReader(FileUtama.getAbsolutePath()));
  } catch (FileNotFoundException e1) {
   e1.printStackTrace();
  }
 final Antlr3JavaLexer lexer = new Antlr3JavaLexer();
 lexer.preserveWhitespacesAndComments = false;

 try {
   lexer.setCharStream(new ANTLRReaderStream(in));
   } catch (IOException e) {
    e.printStackTrace();
   }

 final CommonTokenStream tokens = new CommonTokenStream();
 tokens.setTokenSource(lexer);
    tokens.LT(10); // paksa force load

    Antlr3JavaParser parser = new Antlr3JavaParser(tokens);

    StringBuilder sbr = new StringBuilder();
    List tokenList = tokens.getTokens();
    for (int i = 0; i < tokenList.size(); i++) {          
        org.antlr.runtime.Token token = (org.antlr.runtime.Token) tokenList.get(i);
        int text = token.getType();
        sbr.append(text);
    }


    String mainFile = sbr.toString();
    StringBuffer stringBuffer = new StringBuffer();
    for (String term : new NgramAnalyzer(4).analyzer(mainFile)) {

        stringBuffer.append(term + "\n");

    }
    System.out.println(stringBuffer);
public void runAlgoritma(文件mainFile,列表fileJlist)
BufferedReader in=null;
试一试{
in=new BufferedReader(新文件阅读器(FileUtama.getAbsolutePath());
}捕获(FileNotFoundException e1){
e1.printStackTrace();
}
final Antlr3JavaLexer lexer=new Antlr3JavaLexer();
lexer.preserveWhitespacesAndComments=false;
试一试{
setCharStream(新的AntlReaderStream(in));
}捕获(IOE异常){
e、 printStackTrace();
}
最终的CommonTokenStream令牌=新的CommonTokenStream();
setTokenSource(lexer);
代币。LT(10);//帕克萨力载荷
Antlr3JavaParser=新的Antlr3JavaParser(令牌);
StringBuilder sbr=新的StringBuilder();
List tokenList=tokens.getTokens();
对于(inti=0;i

我想知道,如何使用我制作的n-gram中的jaccard相似性来比较两个java源代码?

这真的超出了ANTLR的范围。一旦你有了n克或一袋单词,你就可以应用相似性度量。真正的问题是要比较的向量中的元素/特征是什么。不过,这将是ANTLR产生的树的函数,取决于你们。你们有没有一个例子,ANTLR先生,来实施这个相似性度量?没有。恐怕超出了我的工作范围。在软件度量文献中寻找工作。