如何在Java中读写非英语字符(特殊字符如马拉地语、泰米尔语、印地语等)?
从Excel文件中读取非英语字符假设读取Marathi语言,然后将该语言写入XML文件。当我从Excel中阅读这个Marathi语言并在Java代码中进行检查时,它正好显示了Marathi语言,但是当我通过Java代码将其写入XML时,我得到了一些与这个Marathi语言对应的符号。所以请告诉我如何处理这种情况。请查找相同的附加代码如何在Java中读写非英语字符(特殊字符如马拉地语、泰米尔语、印地语等)?,java,xml,excel,Java,Xml,Excel,从Excel文件中读取非英语字符假设读取Marathi语言,然后将该语言写入XML文件。当我从Excel中阅读这个Marathi语言并在Java代码中进行检查时,它正好显示了Marathi语言,但是当我通过Java代码将其写入XML时,我得到了一些与这个Marathi语言对应的符号。所以请告诉我如何处理这种情况。请查找相同的附加代码 public void excelToXML(String path) { FileWriter fostream; Print
public void excelToXML(String path) {
FileWriter fostream;
PrintWriter out = null;
String strOutputPath = "C:\\Temp\\";
try {
File file = new File(path);
InputStream inputStream = new FileInputStream(file);
Workbook wb = WorkbookFactory.create(inputStream);
List<String> sheetNames = new ArrayList<String>();
for (int i = 0; i < wb.getNumberOfSheets(); i++) {
sheetNames.add(wb.getSheetName(i));
}
fostream = new FileWriter(strOutputPath + "\\" + "iTicker" + ".xml");
out = new PrintWriter(new BufferedWriter(fostream));
// out.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
out.println("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");
out.println("<root xmlns:xsi=\"http://www.w3.org/3921/XMLSchema-instance\">");
for (String sheetName : sheetNames) {
if(sheetName.equals("Sheet3")){
System.out.println(sheetName);
break;
}
Sheet sheet = wb.getSheet(sheetName);
boolean firstRow = true;
ArrayList<String> myStringArray = new ArrayList<String>();
Iterator<Cell> cells = sheet.getRow(0).cellIterator();
while (cells.hasNext()) {
myStringArray.add(cells.next().toString());
}
for (Row row : sheet) {
if (firstRow == true) {
firstRow = false;
continue;
}
if (!sheetName.equals("Sheet1")) {
out.println("\t<element>");
}
for (int i = 0; i < myStringArray.size(); i++) {
if (row.getCell(i) != null && !(row.getCell(i)).toString().isEmpty()
&& row.getCell(i).toString().length() > 0) {
if(!(myStringArray.get(i) != null && myStringArray.get(i).toString().equals("Start_Epoch_Time") || myStringArray.get(i).toString().equals("End_Epoch_Time"))){
out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));
} else{
long ePochValue=EpochConverter.getepochValue(row.getCell(i).toString());
out.println(formatElement("\t\t", myStringArray.get(i), String.valueOf(ePochValue)));
}
} else {
blankValues.add(sheetName +":" + "column header" +":" +myStringArray.get(i)+":"+"row no:"+row.getRowNum()+" " +"is blank.");
}
}
if (!sheetName.equals("Sheet1")) {
out.println("\t</element>");
}
}
}
out.write("</root>");
out.flush();
out.close();
if(blankValues != null && blankValues.size() >0){
FileUploadController.writeErrorLog(blankValues + "Please fill all the mandatory values.");
}
} catch (Exception e) {
new DTHException(e.getMessage());
e.printStackTrace();
}
}
private static String formatCell(Cell cell)
{
if (cell == null) {
return "";
}
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BLANK:
return "";
case Cell.CELL_TYPE_BOOLEAN:
return Boolean.toString(cell.getBooleanCellValue());
case Cell.CELL_TYPE_ERROR:
return "*error*";
case Cell.CELL_TYPE_NUMERIC:
return df.format(cell.getNumericCellValue());
case Cell.CELL_TYPE_STRING:
return cell.getStringCellValue();
default:
return "<unknown value>";
}
}
private static String formatElement(String prefix, String tag, String value) {
StringBuilder sb = new StringBuilder(prefix);
sb.append("<");
sb.append(tag);
if (value != null && value.length() > 0) {
sb.append(">");
sb.append(value);
sb.append("</");
sb.append(tag);
sb.append(">");
} else {
sb.append("/>");
}
return sb.toString();
}
在下面的行中,我在检查这一行时得到了精确的马拉地值。getCelli值,但在写入该值后得到了不同的输出
out.printlnformatElement\t\t,myStringArray.geti,formatCellrow.getCelli 您的代码有两个大问题 1您显然使用的是Windows路径C:\\Temp,但正如Axel Richter在评论中所述,您使用的是输出文件的默认编码。直接使用文件名创建FileWriter将为您提供平台的默认编码,即Windows ANSI for Windows。这不是您想要的,因为稍后您将使用UTF-8作为编码编写XML头声明 永远不要依赖平台的默认编码。通过OutputStreamWriter和FileOutputStream,始终使用显式编码创建PrintWriter,如下所示: 2手工编写XML是一种不好的做法。如果你这样做了,你应该注意特殊的字符,比如和&。始终建议使用一个库,它会自动进行转义。Java标准库的一部分是接口的实现 下面是一个如何简单使用的示例:
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
public class WriteXml {
public static void main(String[] args) {
try {
File outFile = new File("iTicker.xml");
// Outputstream for the XML document. The XMLStreamWriter should take care of the right encoding.
OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));
XMLStreamWriter xmlWriter =
XMLOutputFactory.newInstance().createXMLStreamWriter(out);
xmlWriter.writeStartDocument("UTF-8", "1.0");
xmlWriter.writeCharacters("\n");
xmlWriter.writeStartElement("root");
xmlWriter.writeNamespace("xsi", "http://www.w3.org/3921/XMLSchema-instance");
xmlWriter.writeCharacters("\n ");
xmlWriter.writeStartElement("element");
// Some special characters and (I hope) some Marathi letters
xmlWriter.writeCharacters("<>&\": मराठी वर्णमाला");
xmlWriter.writeEndElement(); // element
xmlWriter.writeCharacters("\n");
xmlWriter.writeEndElement(); // root
xmlWriter.writeEndDocument();
xmlWriter.close(); // should be better in a finally block
out.close(); // should be better handled automatically by try-with-resources
} catch(Exception e) {
e.printStackTrace();
}
}
}
这将创建以下XML:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">
<element><>&": मराठी वर्णमाला</element>
</root>
到处都需要Unicode。确保每个可以进行编码的方法都被告知UTF-8:此类的构造函数假定默认字符编码和默认字节缓冲区大小是可以接受的。要自己指定这些值,请在FileOutputStream上构造OutputStreamWriter。
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">
<element><>&": मराठी वर्णमाला</element>
</root>