如何阻止java拼写检查程序纠正重复的单词
我实施了一项计划,该计划可实现以下功能:如何阻止java拼写检查程序纠正重复的单词,java,algorithm,if-statement,spell-checking,Java,Algorithm,If Statement,Spell Checking,我实施了一项计划,该计划可实现以下功能: 将网页中的所有单词扫描成字符串(使用jsoup) 过滤掉所有HTML标记和代码 将这些单词放入拼写检查程序并提供建议 拼写检查程序将dictionary.txt文件加载到数组中,并将字符串输入与字典中的单词进行比较 我目前的问题是,当输入多次包含同一个单词时,例如“程序是最差的”,代码将被打印出来 You entered 'teh', did you mean 'the'? You entered 'teh', did you mean 'the'?
You entered 'teh', did you mean 'the'?
You entered 'teh', did you mean 'the'?
有时一个网站会一遍又一遍地出现多个单词,这可能会变得混乱
如果可能的话,将单词连同拼写错误的次数一起打印是完美的,但是限制每个单词打印一次就足够了
我的程序有几个方法和两个类,但拼写检查方法如下:
注意:原始代码包含一些删除标点符号的“if”语句,但为了清晰起见,我删除了它们
static boolean suggestWord;
public static String checkWord(String wordToCheck) {
String wordCheck;
String word = wordToCheck.toLowerCase();
if ((wordCheck = (String) dictionary.get(word)) != null) {
suggestWord = false; // no need to ask for suggestion for a correct
// word.
return wordCheck;
}
// If after all of these checks a word could not be corrected, return as
// a misspelled word.
return word;
}
临时编辑:根据要求,完整代码:
第1类:
public class ParseCleanCheck {
static Hashtable<String, String> dictionary;// To store all the words of the
// dictionary
static boolean suggestWord;// To indicate whether the word is spelled
// correctly or not.
static Scanner urlInput = new Scanner(System.in);
public static String cleanString;
public static String url = "";
public static boolean correct = true;
/**
* PARSER METHOD
*/
public static void PageScanner() throws IOException {
System.out.println("Pick an english website to scan.");
// This do-while loop allows the user to try again after a mistake
do {
try {
System.out.println("Enter a URL, starting with http://");
url = urlInput.nextLine();
// This creates a document out of the HTML on the web page
Document doc = Jsoup.connect(url).get();
// This converts the document into a string to be cleaned
String htmlToClean = doc.toString();
cleanString = Jsoup.clean(htmlToClean, Whitelist.none());
correct = false;
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
} while (correct);
}
/**
* SPELL CHECKER METHOD
*/
public static void SpellChecker() throws IOException {
dictionary = new Hashtable<String, String>();
System.out.println("Searching for spelling errors ... ");
try {
// Read and store the words of the dictionary
BufferedReader dictReader = new BufferedReader(new FileReader("dictionary.txt"));
while (dictReader.ready()) {
String dictInput = dictReader.readLine();
String[] dict = dictInput.split("\\s"); // create an array of
// dictionary words
for (int i = 0; i < dict.length; i++) {
// key and value are identical
dictionary.put(dict[i], dict[i]);
}
}
dictReader.close();
String user_text = "";
// Initializing a spelling suggestion object based on probability
SuggestSpelling suggest = new SuggestSpelling("wordprobabilityDatabase.txt");
// get user input for correction
{
user_text = cleanString;
String[] words = user_text.split(" ");
int error = 0;
for (String word : words) {
if(!dictionary.contains(word)) {
checkWord(word);
dictionary.put(word, word);
}
suggestWord = true;
String outputWord = checkWord(word);
if (suggestWord) {
System.out.println("Suggestions for " + word + " are: " + suggest.correct(outputWord) + "\n");
error++;
}
}
if (error == 0) {
System.out.println("No mistakes found");
}
}
} catch (IOException e) {
e.printStackTrace();
System.exit(-1);
}
}
/**
* METHOD TO SPELL CHECK THE WORDS IN A STRING. IS USED IN SPELL CHECKER
* METHOD THROUGH THE "WORD" STRING
*/
public static String checkWord(String wordToCheck) {
String wordCheck;
String word = wordToCheck.toLowerCase();
if ((wordCheck = (String) dictionary.get(word)) != null) {
suggestWord = false; // no need to ask for suggestion for a correct
// word.
return wordCheck;
}
// If after all of these checks a word could not be corrected, return as
// a misspelled word.
return word;
}
}
公共类语法检查{
静态哈希表字典;//用于存储
//字典
静态布尔suggestWord;//指示单词是否拼写
//正确与否。
静态扫描仪urlInput=新扫描仪(System.in);
公共静态字符串;
公共静态字符串url=“”;
公共静态布尔值correct=true;
/**
*解析器方法
*/
公共静态void PageScanner()引发IOException{
System.out.println(“选择要扫描的英语网站”);
//此do while循环允许用户在出错后重试
做{
试一试{
System.out.println(“输入URL,以http://”开头);
url=urlInput.nextLine();
//这将从网页上的HTML创建一个文档
Document doc=Jsoup.connect(url.get();
//这会将文档转换为要清理的字符串
字符串htmlToClean=doc.toString();
cleanString=Jsoup.clean(htmlToClean,Whitelist.none());
正确=错误;
}捕获(例外e){
System.out.println(“URL格式不正确,请重试”);
}
}而(正确),;
}
/**
*拼写检查法
*/
公共静态无效拼写检查器()引发IOException{
dictionary=新哈希表();
System.out.println(“搜索拼写错误…”);
试一试{
//阅读并储存字典中的单词
BufferedReader dictReader=新的BufferedReader(新文件阅读器(“dictionary.txt”);
while(dictReader.ready()){
字符串dictInput=dictReader.readLine();
String[]dict=dictInput.split(\\s”);//创建
//词典词汇
for(int i=0;i
还有第二个类(SuggestSpelling.java),它包含一个概率计算器,但现在不相关,除非您计划自己运行代码。使用
哈希集来检测重复项-
Set<String> wordSet = new HashSet<>();
编辑
/。。。。
{
user_text=cleanString;
String[]words=user\u text.split(“”);
Set wordSet=newhashset();
整数误差=0;
for(字符串字:字){
//单词集是另一种数据结构
String[] words = // split input sentence into words
for(String word: words) {
if(!wordSet.contains(word)) {
checkWord(word);
// do stuff
wordSet.add(word);
}
}
// ....
{
user_text = cleanString;
String[] words = user_text.split(" ");
Set<String> wordSet = new HashSet<>();
int error = 0;
for (String word : words) {
// wordSet is another data-structure. Its only for duplicates checking, don't mix it with dictionary
if(!wordSet.contains(word)) {
// put all your logic here
wordSet.add(word);
}
}
if (error == 0) {
System.out.println("No mistakes found");
}
}
// ....