Java 为什么TF-IDF只能得到一个结果? //计算术语频率 System.out.println(“请输入所需单词:”); 扫描仪扫描=新扫描仪(System.in); String word=scan.nextLine(); String[]数组=word.split(“”); int filename=11; 字符串[]文件名=新字符串[文件名]; int a=0; int totalCount=0; int字数=0; 对于(a=0;aIDF”+inverseTF); 双TFIDF=((双)字数/总计数)*反向ETF); System.out.println(array2[b]+“-->TFIDF”+TFIDF); } }
嗨,这是我计算术语频率和TF-IDF的代码。第一个代码计算给定字符串的每个文件的术语频率。第二个代码应该使用上面的值计算每个文件的TF-IDF。但我只得到一个值。它应该为每个文档提供TF-IDF值 术语频率的输出示例: 输入的单词是“is”Java 为什么TF-IDF只能得到一个结果? //计算术语频率 System.out.println(“请输入所需单词:”); 扫描仪扫描=新扫描仪(System.in); String word=scan.nextLine(); String[]数组=word.split(“”); int filename=11; 字符串[]文件名=新字符串[文件名]; int a=0; int totalCount=0; int字数=0; 对于(a=0;aIDF”+inverseTF); 双TFIDF=((双)字数/总计数)*反向ETF); System.out.println(array2[b]+“-->TFIDF”+TFIDF); } },java,tf-idf,Java,Tf Idf,嗨,这是我计算术语频率和TF-IDF的代码。第一个代码计算给定字符串的每个文件的术语频率。第二个代码应该使用上面的值计算每个文件的TF-IDF。但我只得到一个值。它应该为每个文档提供TF-IDF值 术语频率的输出示例: 输入的单词是“is” |File=abc0.txt | is--->字数=| 2 |总数=| 150 |术语频率=| 0.0133| 输入的单词是“是” |File=abc1.txt | 是--->字数=| 0 |总数=| 9 |术语频率=| 0.0000 | TF-IDF
|File=abc0.txt |
is--->字数=| 2 |总数=| 150 |术语频率=| 0.0133| 输入的单词是“是”
|File=abc1.txt |
是--->字数=| 0 |总数=| 9 |术语频率=| 0.0000 | TF-IDF 是-->包含术语7的文件数 is-->IDF 0.1962946357308887
是否-->TFIDF 0.0028607962606519654假设每个文件重复的println语句是
// Calculating term frequency
System.out.println("Please enter the required word :");
Scanner scan = new Scanner(System.in);
String word = scan.nextLine();
String[] array = word.split(" ");
int filename = 11;
String[] fileName = new String[filename];
int a = 0;
int totalCount = 0;
int wordCount = 0;
for (a = 0; a < filename; a++) {
try {
System.out.println("The word inputted is " + word);
File file = new File(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a
+ ".txt");
System.out.println(" _________________");
System.out.print("| File = abc" + a + ".txt | \t\t \n");
for (int i = 0; i < array.length; i++) {
totalCount = 0;
wordCount = 0;
Scanner s = new Scanner(file);
{
while (s.hasNext()) {
totalCount++;
if (s.next().equals(array[i]))
wordCount++;
}
System.out.print(array[i] + " ---> Word count = "
+ "\t\t " + "|" + wordCount + "|");
System.out.print(" Total count = " + "\t\t " + "|"
+ totalCount + "|");
System.out.printf(" Term Frequency = | %8.4f |",
(double) wordCount / totalCount);
System.out.println("\t ");
}
}
} catch (FileNotFoundException e) {
System.out.println("File is not found");
}
}
System.out.println("Please enter the required word :");
Scanner scan2 = new Scanner(System.in);
String word2 = scan2.nextLine();
String[] array2 = word2.split(" ");
int numofDoc;
for (int b = 0; b < array2.length; b++) {
numofDoc = 0;
for (int i = 0; i < filename; i++) {
try {
BufferedReader in = new BufferedReader(new FileReader(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc"
+ i + ".txt"));
int matchedWord = 0;
Scanner s2 = new Scanner(in);
{
while (s2.hasNext()) {
if (s2.next().equals(array2[b]))
matchedWord++;
}
}
if (matchedWord > 0)
numofDoc++;
} catch (IOException e) {
System.out.println("File not found.");
}
}
System.out.println(array2[b]
+ " --> This number of files that contain the term "
+ numofDoc);
double inverseTF = Math.log10((float) numDoc / numofDoc);
System.out.println(array2[b] + " --> IDF " + inverseTF );
double TFIDF = (((double) wordCount / totalCount) * inverseTF );
System.out.println(array2[b] + " --> TFIDF " + TFIDF);
}
}
但它包含在单个循环中
double TFIDF = (((double) wordCount / totalCount) * inverseTF );
System.out.println(array2[b] + " --> TFIDF " + TFIDF);
for(int b=0;b
只是。如果要按文件打印这一行,则必须在所有文件上用另一个循环环绕该语句
由于这是家庭作业,我将不包括最后的代码,但给您另一个提示:在TFIDF的计算中还包括了变量wordCount和totalCount。但每个文件名/单词对都是唯一的。因此,您不仅需要保存一次,还需要按照文件名/单词保存一次,或者在最后一个循环中再次将其重新包含。打印TDIDF的部分需要移动到循环所有文件的for循环中 即: } }除了实际答案(霍华德给出的答案)之外,你应该更加注意命名。使用名为“fileName”和“fileName”的变量(其中一个是
int
)非常令人困惑。
for (int b = 0; b < array2.length; b++)
System.out.println(array2[b]
+ " --> This number of files that contain the term "
+ numofDoc);
double inverseTF = Math.log10((float) numDoc / numofDoc);
System.out.println(array2[b] + " --> IDF " + inverseTF );
double TFIDF = (((double) wordCount / totalCount) * inverseTF );
System.out.println(array2[b] + " --> TFIDF " + TFIDF);
}