处理文件时遇到的Java问题_Java_String_Replace_Replaceall

处理文件时遇到的Java问题

java string replace

处理文件时遇到的Java问题,java,string,replace,replaceall,Java,String,Replace,Replaceall,试图从字符串中删除子序列\u000时遇到一些问题首先，我通过stringstr=newstring（字节，“UTF8”）将文件中的字节[]读入字符串然后我得到str，它等于\u0004Word，意思是4Word4是单词的长度word。所以现在我需要把它转换成常规的4Words替换全部（“\u000”和“”），replaceALL（“\\\\u000”，“u000”）等不起作用。怎么做 void FillingStorage() throws Exception{ Path path =

试图从字符串中删除子序列

\u000

时遇到一些问题

首先，我通过

stringstr=newstring（字节，“UTF8”）将文件中的字节[]读入字符串

然后我得到

str

，它等于

\u0004Word

，意思是

4Word

是单词的长度

word

。所以现在我需要把它转换成常规的

4Words

<代码>替换全部（“\u000”和“”），

replaceALL（“\\\\u000”，“u000”）

等不起作用。怎么做

void FillingStorage() throws Exception{
    Path path = Paths.get(System.getProperty("db.file"));//that's my file
    byte[] data = Files.readAllBytes(path);
    String str = new String(data, "UTF8");
    System.out.println(str);
    String res = str.replaceAll("I don't know what to write here cos nothing I've tried works");
    return;
}

更新！首先，我用

Key->Value和Key1->Value1

填充HashMap。然后我把它以字节的形式写入文件。因此，当我尝试将其转换回字符串并打印时，我看到：

Key-Value-Key1-Value1

而不是

3Key-5Value-4Key1-6Value1

。但令人惊讶的是，如果你看我打印的字符串，你会看到smth是这样的：

\u0003Key\u0005Value等等。

看起来我的字符串包含这些数字，但java无法打印它们

这是我在文件中写入字节的方式：

DataOutputStream stream = new DataOutputStream(new FileOutputStream(System.getProperty("db.file"), true));
    for (Map.Entry<String, String> entry : storage.entrySet()) {
        byte[] bytesKey = entry.getKey().getBytes(StandardCharsets.UTF_8);
        stream.write((int)bytesKey.length);//it disappears!
        stream.write(bytesKey);
        byte[] bytesVal = entry.getValue().getBytes(StandardCharsets.UTF_8);
        stream.write((Integer)bytesVal.length);//disappears too!
        stream.write(bytesVal);
    }
    stream.close();

DataOutputStream=newdataoutputstream（newfileoutputstream（System.getProperty（“db.file”），true））；
for（Map.Entry:storage.entrySet（））{
byte[]bytesKey=entry.getKey（）.getBytes（StandardCharsets.UTF_8）；
stream.write（（int）bytesKey.length）；//它消失了！
stream.write（bytesKey）；
byte[]bytesVal=entry.getValue（）.getBytes（StandardCharsets.UTF_8）；
stream.write（（整数）bytesVal.length）；//也会消失！
stream.write（bytesVal）；
}
stream.close（）；

首先，您的需求不需要正则表达式，因此您应该使用

replace（）

其次，

\uxxx

是Java中的字符文字语法，因此不清楚字符串中是否有字符

；更合乎逻辑的做法是，字节数组只从等于4的单字节开始，这是字符串长度

在这种情况下，在转换为

String

时，您只需使用接受

offset

和

len

参数的构造函数，从数组中丢弃初始字节即可

如果您碰巧在字符串中包含所有这些字符，那么只需使用

子字符串

就可以去掉开头的6个字符。

打印

str

时会看到什么？我之所以这样问，是因为我怀疑其中是否有

\u000

，因为您声称

replaceALL（“\\\\u000”）

不起作用。或者您可能忘了将

replaceAll

的结果存储在

str

引用中（字符串是不可变的，所以原始字符串不会被

replaceAll

方法更改，而是创建并返回了新字符串）。您可以粘贴您的replaceAll代码行吗？我看到单词前面有一个空格的'Words'，但是您应该使用

新字符串（data，StandardCharsets.UTF_8）

，以避免UTF-8实际无法发生的

不支持的编码异常。字符串长度是否可以超过127个字符？如果字符串开头有一个无关字符\u0080
或更大的字符，则将导致将数据解释为UTF-8时出现问题。在将其转换为字符串之前，您需要删除长度。这并不容易，因为字节
像1A3AA2AA
一样保留smth，因此以您的方式解析它非常困难。然后您应该在问题中提出这些困难。更新后，我遇到了一个新问题=(