Parsing Java扫描程序hasNext（字符串）方法有时不匹配_Parsing_Java.util.scanner

Parsing Java扫描程序hasNext（字符串）方法有时不匹配

parsing

Parsing Java扫描程序hasNext（字符串）方法有时不匹配,parsing,java.util.scanner,Parsing,Java.util.scanner,我试图使用Java ScannerhasNext方法，但得到了奇怪的结果。也许我的问题很明显，但是为什么这个简单的表达式“[a-zA-Z']+”不适用于这样的词：“points.anything，supervisor”。我也尝试过这个“[\\w']+” public HashMap<String, Integer> getDocumentWordStructureFromPath(File file) { HashMap<String, Integer> dict

我试图使用Java Scanner

hasNext

方法，但得到了奇怪的结果。也许我的问题很明显，但是为什么这个简单的表达式

“[a-zA-Z']+”

不适用于这样的词：“points.anything，supervisor”。我也尝试过这个

“[\\w']+”

public HashMap<String, Integer> getDocumentWordStructureFromPath(File file) {
    HashMap<String, Integer> dictionary = new HashMap<>();
    try {
        Scanner lineScanner = new Scanner(file);
        while (lineScanner.hasNextLine()) {
            Scanner scanner = new Scanner(lineScanner.nextLine());
            while (scanner.hasNext("[\\w']+")) {
                String word = scanner.next().toLowerCase();
                if (word.length() > 2) {
                    int count = dictionary.containsKey(word) ? dictionary.get(word).intValue() + 1 : 1;
                    dictionary.put(word, new Integer(count));
                }
            }
            scanner.close();
        }
        //scanner.useDelimiter(DELIMITER);
        lineScanner.close();

        return dictionary;

    } catch (FileNotFoundException e) { 
        e.printStackTrace();
        return null;
    }   
}

public HashMap getDocumentWordStructureFromPath（文件）{
HashMap dictionary=新建HashMap（）；
试一试{
扫描仪lineScanner=新扫描仪（文件）；
while（lineScanner.hasNextLine（））{
Scanner Scanner=新扫描仪（lineScanner.nextLine（））；
while（scanner.hasNext（[\\w']+”）{
字符串字=scanner.next（）.toLowerCase（）；
if（word.length（）>2）{
int count=dictionary.containsKey（word）？dictionary.get（word）.intValue（）+1:1；
put（单词，新整数（计数））；
}
}
scanner.close（）；
}
//scanner.useDelimiter（分隔符）；
lineScanner.close（）；
返回字典；
}catch（filenotfound异常）{
e、 printStackTrace（）；
返回null；
}   
}

您的正则表达式应该是这样的

[^a-zA-z]+

，因为您需要分隔所有非字母的内容：

// previous code...
Scanner scanner = new Scanner(lineScanner.nextLine()).useDelimiter("[^a-zA-z]+");
    while (scanner.hasNext()) {
        String word = scanner.next().toLowerCase();
        // ...your other code
    }
}
// ... after code

EDIT——为什么不使用hasNext（String）方法？？

这一行：

Scanner scanner = new Scanner(lineScanner.nextLine());

它真正做的是为您编译一个WhitePCE模式，因此，如果您有这样一个测试行

“Hello World.a test，ok。”

它将为您提供以下令牌：

你好
世界
A
测试
嗯

然后，如果您使用

scanner.hasNext（“[a-ZA-Z]+”）

您询问扫描器

是否有与您的模式匹配的令牌

，在本例中，它将为第一个令牌声明

true

：

您好（因为这是与您指定的模式匹配的第一个标记）

对于下一个标记（World.）

它与模式不匹配

，因此它将简单地

失败

和

扫描仪。hasNext（[a-ZA-Z]+”）

将返回

false

，因此它对于前面有任何非字母字符的单词都不起作用。你明白了吗

现在。。。希望这有帮助。

非常感谢@Angel Rodriguez这是一个很好的解决方案，但我不知道为什么不使用hasnext（字符串）函数。好的，我明白你的意思了，我已经编辑了帖子。。。我解释了为什么它不起作用。。。希望它能帮上忙…非常感谢，我拿到了。非常感谢你的帮助+1.详细说明。