将字节数组转换为字符串-java_Java_Byte_Bytearray_Bytearrayinputstream

将字节数组转换为字符串-java

java

将字节数组转换为字符串-java,java,byte,bytearray,bytearrayinputstream,Java,Byte,Bytearray,Bytearrayinputstream,我试图将文件内容读入任何可读形式。我使用FileInputStream将文件读取到字节数组，然后尝试将该字节数组转换为字符串到目前为止，我已经尝试了3种不同的方法： FileInputStream inputStream = new FileInputStream(file); byte[] clearTextBytes = new byte[(int) file.length()]; inputStream.read(clearTextBytes); String s = IOUtils.

我试图将文件内容读入任何可读形式。我使用FileInputStream将文件读取到字节数组，然后尝试将该字节数组转换为字符串

到目前为止，我已经尝试了3种不同的方法：

FileInputStream inputStream = new FileInputStream(file);
byte[] clearTextBytes = new byte[(int) file.length()];
inputStream.read(clearTextBytes);

String s = IOUtils.toString(inputStream); //first way

String str = new String(clearTextBytes, "UTF-8"); //second way

String string = Arrays.toString(clearTextBytes); //third way
String[] byteValue = string.substring(1, string.length() - 1).split(",");
byte[] bytes = new byte[byteValue.length]
for(int i=0, len=bytes.length; i<len; i++){
   bytes[i] = Byte.parseByte(byteValue[i].trim());
}
String newStr = new String(bytes);

FileInputStream-inputStream=newfileinputstream（文件）；
字节[]clearTextBytes=新字节[（int）file.length（）]；
inputStream.read（clearTextBytes）；
字符串s=IOUtils.toString（inputStream）//第一条路
String str=新字符串（clearTextBytes，“UTF-8”）//第二条路
String String=Arrays.toString（clearTextBytes）//第三条路
String[]byteValue=String.substring（1，String.length（）-1）；
byte[]bytes=新字节[byteValue.length]
对于（int i=0，len=bytes.length；i，正如其他人所指出的，数据看起来不包含任何文本，因此它很可能是二进制数据，而不是文本。请注意，以PK开头的文件可能是PKZIP格式的，数据的随机性确实表明它可以被压缩。
尝试将重命名文件的命令设置为在末尾使用.ZIP
，然后查看是否可以在文件资源管理器中打开它
从上面的链接中，DOCX文件的开头如下所示
50 4B 03 04 14 00 06 00 PK。。。。。。
DOCX、PPTX、XLSX
Microsoft Office Open XML Format (OOXML) Document

NOTE: There is no subheader for MS OOXML files as there is with
DOC, PPT, and XLS files. To better understand the format of these files,
rename any OOXML file to have a .ZIP extension and then unZIP the file;
look at the resultant file named [Content_Types].xml to see the content
types. In particular, look for the <Override PartName= tag, where you
will find word, ppt, or xl, respectively.

Trailer: Look for 50 4B 05 06 (PK..) followed by 18 additional bytes
at the end of the file.

MicrosoftOfficeOpenXML格式（OOXML）文档
注意：MS OOXML文件没有与之相同的子标题
DOC、PPT和XLS文件。为了更好地理解这些文件的格式，
重命名任何OOXML文件，使其扩展名为.ZIP，然后解压缩该文件；
查看名为[Content\u Types].xml的结果文件以查看内容
类型。特别是，查找可容纳2097152个不同字符的UTF8，如果没有图像，您会看到问号。请尝试使用经典dos代码页：
new String(clearTextBytes, "DOS-US");

查看此项以获取word文件的文本内容：您需要库
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

[...]

   XWPFDocument docx = new XWPFDocument(new FileInputStream("file.docx"));       
   XWPFWordExtractor we = new XWPFWordExtractor(docx);
   System.out.println(we.getText());

我编写了一个非常基本的程序来读取文件的内容，并在控制台中的新行上打印每个字符串。以下是文件的内容：

这是我写的程序：
import java.io.*;
import java.util.*;

class Test {
    public static void main(String args[]) throws FileNotFoundException {
        File file = new File("File1.txt");
        Scanner input = new Scanner(file);

        while (input.hasNext()) {
            System.out.println(input.next());
        }

        input.close();

    } // main()
} // class Test

这是控制台的输出：
apples
pears
1
2
3
oranges
carrots
bananas
pineapples

我猜您的字节数组首先不包含字符串。从您提供的内容来看，我认为这是一个Word文档，而不是txt。要读取Word文档的内容，您需要一些库，如Apache POI，您确定该文件不是zip文件吗？通常，当您尝试直接从zip文件读取时，会发生这种情况nd不要解压缩。我猜“第一种方式”不会打印任何内容，因为您已经将所有内容从inputStream
读取到clearTextBytes
，因此没有更多的字节可读取。@stackflow…文件开始PK；）但这可能是解密的zip或解密的docxGreat-我已经把它做成了一个zip并打开了它。然而，当我输出字符串而不是在输入中时，我对检查编码是什么和检查损坏不是什么的意思有点困惑。那么，您的possibleCharsets函数应该返回所有不包含的字符集吗�, 然后我用它创建一个新字符串？抱歉，我对字节/二进制数据/ascii之类的东西还不太熟悉。（另外，最初我试图读入的Word文档是一个简单的.docx）@KevinDonahoe docx文件格式并不简单；）你需要一个专门用来阅读这样一份文件的图书馆才能有机会阅读它。由于它是二进制格式，字符编码不适用。
apples
pears
1
2
3
oranges
carrots
bananas
pineapples