Java 扫描仪无法读取文件中的外来字符_Java_Character Encoding_Java.util.scanner

Java 扫描仪无法读取文件中的外来字符

java character-encoding

Java 扫描仪无法读取文件中的外来字符,java,character-encoding,java.util.scanner,Java,Character Encoding,Java.util.scanner,我目前正在为一个大学项目创建一个可以提取和搜索存储在smartwatch上的数据的工具我已经能够从我的smartwatch中提取一个名为“Node.db”的文件，其中包含smartwatch连接到的移动电话的蓝牙MAC地址。我现在正在尝试创建一个扫描仪，然后扫描这个“node.db”文件并打印出MAC地址这是我目前拥有的代码： // Identify the location of the node.txt file File file = new File("C:\\WatchDa

我目前正在为一个大学项目创建一个可以提取和搜索存储在smartwatch上的数据的工具

我已经能够从我的smartwatch中提取一个名为“Node.db”的文件，其中包含smartwatch连接到的移动电话的蓝牙MAC地址。我现在正在尝试创建一个扫描仪，然后扫描这个“node.db”文件并打印出MAC地址

这是我目前拥有的代码：

// Identify the location of the node.txt file    
File file = new File("C:\\WatchData\\node.txt");
// Notify the user that Bluetooth extraction has initalized
Txt_Results.append("Pulling bluetooth data...");
        Scanner in = null;
        try {
            in = new Scanner(file);
            while(in.hasNext())
            {   // Scan till the end of the file
                String line=in.nextLine();
                // Scan the file for this string
                if(line.contains("settings.bluetooth"))
                // Print the MAC Address string out for the user
                    System.out.println(line);
            }
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

以前的函数将文件转换为.txt。代码搜索每一行并查找“settings.bluetooth”，如果找到，则应打印出包含MAC地址的这一行。但是，我认为node.db文件的格式阻止了扫描仪查找此字符串。我相信文件中的一些数据是经过编码的。下面显示了如何显示数据的示例。我相信是它不认识的黑人角色：

当我在文件上运行代码时，程序将简单地挂起并且不提供错误消息。我让程序运行了20多分钟，但仍然没有成功

我正试图从文件中打印出的确切行如下所示：

我已经在一个没有这些编码字符的文本文件上测试了这段代码，并且可以得出结论，这段代码确实有效。因此，我的问题如下：

有没有办法让扫描仪跳过文件中它无法识别的字符，以便继续扫描文件？

提前感谢。

因为您没有在这里提供文件，所以我无法编写代码来测试您的文件。看起来您的文件的编码与Java用于解码的编码不同

因此，您需要为输入流尝试不同的编码设置

通常，您可以通过以下方式指定编码：

String encoding = "UTF-8"; // try "UTF-8" first and also change to other encodings to see the results
Reader reader = new InputStreamReader(new FileInputStream("your_file_name"), encoding);

。这篇文章还讨论了如何编写代码来检测文件的编码

public static void main(String args[]) {
    String encoding = "UTF-8"; // try "UTF-8" first and also change to other encodings to see the results

    StringBuilder sb = new StringBuilder();
    try(Reader reader = new InputStreamReader(new FileInputStream("node.txt"), encoding)) {
        int c = -1;
        while ((c = reader.read()) != -1) {
            if (eligible(c)) {
                sb.append((char)c);
            }
        }
    } catch (Exception e){
        e.printStackTrace();
    }

    int index = sb.indexOf("settings.bluetooth");
    if (index >= 0) {
        System.out.println(sb.substring(index));
    }
}

public static boolean eligible(int c) {
    return (c >= 'a' && c <= 'z' || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '.');
}

顺便说一句，文件中显示的暗背景解码字符是ASCII中的一些控制字符

我还建议您尝试更改文本查看器应用程序的解码方法，以查看是否可以使用特定的编码方法正确显示文本

更新

看起来，

Scanner

无法工作，而使用其他IO类实际上可以正常工作

StringBuilder sb = new StringBuilder();

try (BufferedReader reader = new BufferedReader(new FileReader("node.txt"))) {

    String line;
    while ((line = reader.readLine()) != null) {
        sb.append(line);
    }

} catch (Exception e) {
    // TODO: handle exception
}


int index = sb.indexOf("settings.bluetooth");
if (index != -1)
    System.out.println(sb.substring(index, index + 18));

更新

看起来只有从文件创建

扫描仪时，从文件读取时，扫描仪的一个内部方法才会出现异常。但是，使用如下输入流将始终有效，甚至可以将其包装在扫描仪中
try (Scanner s = new Scanner(new FileInputStream("node.txt"))) {
    while(s.hasNext()) {
        System.out.println(s.next());
    }
} catch (Exception e) {
    e.printStackTrace();
}

更新
此解决方案只是从文件中删除所有非法字符
public static void main(String args[]) {
    String encoding = "UTF-8"; // try "UTF-8" first and also change to other encodings to see the results

    StringBuilder sb = new StringBuilder();
    try(Reader reader = new InputStreamReader(new FileInputStream("node.txt"), encoding)) {
        int c = -1;
        while ((c = reader.read()) != -1) {
            if (eligible(c)) {
                sb.append((char)c);
            }
        }
    } catch (Exception e){
        e.printStackTrace();
    }

    int index = sb.indexOf("settings.bluetooth");
    if (index >= 0) {
        System.out.println(sb.substring(index));
    }
}

public static boolean eligible(int c) {
    return (c >= 'a' && c <= 'z' || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '.');
}

publicstaticvoidmain（字符串参数[]）{
String encoding=“UTF-8”；//首先尝试“UTF-8”，并更改为其他编码以查看结果
StringBuilder sb=新的StringBuilder（）；
try（Reader=newInputStreamReader（新文件InputStream（“node.txt”），编码））{
int c=-1；
而（（c=reader.read（））！=-1）{
如果（合格（c））{
sb.附加（（char）c）；
}
}
}捕获（例外e）{
e、 printStackTrace（）；
}
int index=sb.indexOf（“settings.bluetooth”）；
如果（索引>=0）{
系统输出println（sb子串（索引））；
}
}
公共静态布尔值（int c）{
return（c>='a'&&c='a'&&c='0'&&c有什么错误吗？@Hackerdarshi嘿，那里。程序只是挂起，没有显示错误消息。我已更新了问题以包含此信息。是否可以包含一些行（包括您希望打印的行；不是所有行）我已经在文件中添加了显示MAC地址的行。请使用hasNextLine（）
而不是hasNext（）
，并按照回答将编码添加到新扫描仪（…，encoding）
，不太可能引起问题。请尝试“ISO-8859-1”to，因为它应该接受所有值。跳过字符是可行的，但没有问题。可能System.out.println（line.replaceAll（“\\P{Ascii}，”））删除非ASCII。我只能提供一个指向文件样本的链接，因为整个文件包含大约38000行，并且还包含一些敏感数据。但是可以在此处找到指向文件样本的链接：@JPMScanner
不适用于此文件。我仍在寻找原因。但是，使用文件读取器
works和我更新的答案一样好。您好，谢谢您的帮助。我刚刚使用了您的代码，不得不将“+18”改为“+65”为了确保它显示了整行。但是，它显示了一些编码数据，如图所示：。是否仍有方法去除方块？提前谢谢。@JPM如果您只对MAC地址感兴趣，我建议您删除这些字符。要做到这一点，只需从文件中一次读取一个字符，如果合法，请说“a-z（a-z）”“，”或“1-9”，将其附加到StringBuilder
中，然后在该字符串中查找设置。bluetooth
。您好。我明白您的意思，但我不完全确定如何将其添加到代码中。您能用此更新答案吗？谢谢。