Java 我需要将XML文件转换为CSV,但必须只读取一次文件,记录标签中的标题可能会更改,有什么想法吗?
我有一些非常大的XML文件,需要解析这些文件并将相关数据提取到csv文件中,基本上对XML文档执行部分展平。XML文件将有一个“records标记”,所有记录都存储在该标记中。它看起来很像这样,例如:Java 我需要将XML文件转换为CSV,但必须只读取一次文件,记录标签中的标题可能会更改,有什么想法吗?,java,xml,stream,azure-blob-storage,woodstox,Java,Xml,Stream,Azure Blob Storage,Woodstox,我有一些非常大的XML文件,需要解析这些文件并将相关数据提取到csv文件中,基本上对XML文档执行部分展平。XML文件将有一个“records标记”,所有记录都存储在该标记中。它看起来很像这样,例如: <persons> <person id="1"> <firstname>James</firstname> <lastname>Smith</lastname> <
<persons>
<person id="1">
<firstname>James</firstname>
<lastname>Smith</lastname>
<middlename></middlename>
<dob_year>1980</dob_year>
<dob_month>1</dob_month>
<gender>M</gender>
<salary currency="Euro">10000</salary>
</person>
<person id="2">
<firstname>Michael</firstname>
<lastname></lastname>
<middlename>Rose</middlename>
<dob_year>1990</dob_year>
<dob_month>6</dob_month>
<gender>M</gender>
<salary currency="Dollor">10000</salary>
</persons>
这可能是不正确的-我很快打了出来-但你明白了
要记住一些约束条件:
person
,在本例中)。你可以看到
但它也有一些局限性。我知道如何修复/解决大多数问题,但有两个问题我无法在不打破限制的情况下解决
<person id="1">
<firstname>James</firstname>
<middlename></middlename>
<lastname>Smith</lastname>
</person>
<person id="2">
<firstname>Michael</firstname>
<lastname>Jordan</lastname>
<middlename>Rose</middlename>
<dob>1/10/11</dob>
</person>
这当然是个大问题,必须解决
我的解决方案
在我找到解决方案之前,我将非常快地总结一下程序是如何工作的。解析器在XML文档中移动,每次遇到标记时,它都会将标记吐回我的程序。我的程序有一个“行标记”,正如我所解释的,程序会查找它。一旦遇到,我的程序就会开始查看此行标记中的所有标记和值,并将它们保存在StringBuilder中。它将在遇到end行标记时转储这些信息。在第一次迭代期间,它还将保存遇到的所有头,然后在到达结束标记后转储记录值之前,它将首先转储头
现在。。。正如我提到的,这会在保存顺序以及任何更改、删除或添加的标记时产生问题。我有一个解决顺序问题的解决方案,它应该解决未更新的标题问题,但我不确定它是否适用于我的用例(稍后我将解释原因)
我的想法是有一个类似hashmap的东西来收集标签的值以及它们在一段时间内遇到的顺序。键是标记的值,值是标记第一次出现的顺序
当我们在程序中移动时收集记录时,我们会将它们放在一个数组中,该数组的大小与hashmap所需的正确位置相同。如果遇到新的标记,我们只需调整数组大小并将标记添加到hashmap中,而不是使用当前遇到的顺序值(因为这可能会覆盖某些内容),而是使用前一个元素+1的值(这将是一个有序的hashmap,因此我知道前一个元素是什么)
一旦我们完成了程序,我们将把收集到的头转储到文件的第一行
让我们来看第一个例子:
<person id="1">
<firstname>James</firstname>
<middlename></middlename>
<lastname>Smith</lastname>
</person>
<person id="2">
<firstname>Michael</firstname>
<lastname>Jordan</lastname>
<middlename>Rose</middlename>
</person>
在第二次运行期间,数组的第二个值将为null(因为middlename不在那里),我们只需将其转换为空字符串,并像正常情况一样在其中添加逗号
有趣的是,当添加某些内容时:
<person id="1">
<firstname>James</firstname>
<middlename></middlename>
<lastname>Smith</lastname>
</person>
<person id="2">
<firstname>Michael</firstname>
<lastname>Rose</lastname>
<dob></dob>
</person>
第一列没有额外的,Smith
之后的,但看起来即使这不是有效的CSV,也没问题?酷
不管怎么说,我认为现在真正的问题来了。我实际上并没有在java中使用bufferedreader/bufferedwriter。我们使用Azure附带的流读写器,因为所有这些文件都在云上,而在后台,基本上只是RESTAPI调用。因此,我认为我无法将标题转储到文件的第一行。无论如何,我甚至不确定这是否可能
所以。有没有天才有什么想法?xls到csv转换
xls to csv convert
public class DeviceLibraryModel {
private String parameterName;
private String dataType;
private String noOfRegister;
private String address;
public String getParameterName() {
return parameterName;
}
public String getDataType() {
return dataType;
}
public String getNoOfRegister() {
return noOfRegister;
}
public String getAddress() {
return address;
}
public void setParameterName(String parameterName) {
this.parameterName = parameterName;
}
public void setDataType(String dataType) {
this.dataType = dataType;
}
public void setNoOfRegister(String noOfRegister) {
this.noOfRegister = noOfRegister;
}
public void setAddress(String address) {
this.address = address;
}
@Override
public String toString() {
return "DeviceLibraryModel{" + "ParameterName=" + parameterName + ", DataType=" + dataType + ", NoOfRegister=" + noOfRegister + ", Address=" + address + '}';
}
}
>
public class HeaderNameIndex {
private int pratameterNameIndex;
private int dataTypeIndex;
private int noOfRegister;
private int address;
public HeaderNameIndex(){
}
public int getPratameterNameIndex() {
return pratameterNameIndex;
}
public int getDataTypeIndex() {
return dataTypeIndex;
}
public int getNoOfRegister() {
return noOfRegister;
}
public int getAddress() {
return address;
}
public void setPratameterNameIndex(int pratameterNameIndex) {
this.pratameterNameIndex = pratameterNameIndex;
}
public void setDataTypeIndex(int dataTypeIndex) {
this.dataTypeIndex = dataTypeIndex;
}
public void setNoOfRegister(int noOfRegister) {
this.noOfRegister = noOfRegister;
}
public void setAddress(int address) {
this.address = address;
}
}
package model;
public interface HeaderNameInt {
String PARAMETERNAME="Parameter Name";
String DATATYPE="Data Type";
String NOOFREGISTER="No Of Register";
String ADDRESS="Address";
}
package services;
public class ReadFromXls extends HeaderNameIndex implements HeaderNameInt {
public List<DeviceLibraryModel> xlsConvert(String xlsPath) throws FileNotFoundException, IOException {
File file = new File(xlsPath);
FileInputStream fi = new FileInputStream(file);
List<DeviceLibraryModel> list = new ArrayList<>();
HeaderNameIndex objHeaderNameIndex = new HeaderNameIndex();
Workbook hw = new HSSFWorkbook(fi);
Sheet sheet = hw.getSheetAt(0);
Iterator<Row> rit = sheet.rowIterator();
int rowNumber = 0;
while (rit.hasNext()) {
Row next = rit.next();
DeviceLibraryModel dm = new DeviceLibraryModel();
Iterator<Cell> cit = next.cellIterator();
while (cit.hasNext()) {
Cell cellit = cit.next();
int iColumnIndex = cellit.getColumnIndex();
DataFormatter dataFormatter = new DataFormatter();//to get all string
String formatCellValue = dataFormatter.formatCellValue(cellit);
if (rowNumber == 0) {
switch (formatCellValue) {
case PARAMETERNAME:
objHeaderNameIndex.setPratameterNameIndex(iColumnIndex);
break;
case DATATYPE:
objHeaderNameIndex.setDataTypeIndex(iColumnIndex);
//System.err.println(objHeaderNameIndex.getDataTypeIndex());
break;
case NOOFREGISTER:
objHeaderNameIndex.setNoOfRegister(iColumnIndex);
//System.err.println(objHeaderNameIndex.getNoOfRegister());
break;
case ADDRESS:
objHeaderNameIndex.setAddress(iColumnIndex);
break;
default:
System.err.println("nothing");
}
}
if (rowNumber > 0) {
if(iColumnIndex == objHeaderNameIndex.getPratameterNameIndex())
dm.setParameterName(formatCellValue);
else if (iColumnIndex == objHeaderNameIndex.getDataTypeIndex())
dm.setDataType(formatCellValue);
else if (iColumnIndex == objHeaderNameIndex.getNoOfRegister())
dm.setNoOfRegister(formatCellValue);
else if (iColumnIndex == objHeaderNameIndex.getAddress())
dm.setAddress(formatCellValue);
}
}
if (rowNumber > 0) {
list.add(dm);
}
rowNumber++;
fi.close();
}
// System.err.println(list);
return list;
}
}
public class ConvertXlsToCsv {
public void toCsv() throws IOException{
ReadFromXls readXls=new ReadFromXls();
String xlsPath="C:\\Users\\admin\\Desktop\\Java Training\\Input file\\Device.xls";
List<DeviceLibraryModel> list = readXls.xlsConvert(xlsPath);
String sep=",";
String csvPath="C:\\Users\\admin\\Desktop\\Java Training\\Input file\\XlsToCsv.csv";
File file=new File(csvPath);
FileWriter writeData=new FileWriter(file,true);
for(DeviceLibraryModel dm:list)
{
if(file.exists())
{
String parameterName = dm.getParameterName();
writeData.append(parameterName+ '\n');
writeData.append(sep+dm.getDataType()+sep+ '\n');
writeData.append(dm.getNoOfRegister()+sep+ '\n');
writeData.append(dm.getAddress()+sep+ '\n');
}else
{
String parameterName = dm.getParameterName();
writeData.write(parameterName);
writeData.write(sep+dm.getDataType()+sep);
writeData.write(dm.getNoOfRegister()+sep);
writeData.write(dm.getAddress()+sep);
}
}
writeData.flush();
writeData.close();
}
}
`
public class ReadXls {
public static void main(String[] args) {
ConvertXlsToCsv convert=new ConvertXlsToCsv();
try {
convert.toCsv();
} catch (IOException ex) {
System.err.println(ex);
Logger.getLogger(ReadXls.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
`
公共类设备库模型{
私有字符串参数名;
私有字符串数据类型;
私人登记员;
私有字符串地址;
公共字符串getParameterName(){
返回参数名;
}
公共字符串getDataType(){
返回数据类型;
}
公共字符串getNoOfRegister(){
返回noOfRegister;
}
公共字符串getAddress(){
回信地址;
}
public void setParameterName(字符串parameterName){
this.parameterName=parameterName;
}
public void setDataType(字符串数据类型){
this.dataType=数据类型;
}
公共无效setNoOfRegister(字符串noOfRegister){
this.noOfRegister=noOfRegister;
}
公共无效设置地址(字符串地址){
this.address=地址;
}
@凌驾
公共字符串toString(){
返回“DeviceLibraryModel{”+“ParameterName=“+ParameterName+”,DataType=“+DataType+”,NoOfRegister=“+NoOfRegister+”,Address=“+Address++}”;
}
}
>
公共类标题索引{
私人int pratameterNameIndex;
私有int数据类型索引;
私人注册会计师;
专用int地址;
公共部门负责人指数(){
}
public int getPratameterNameIndex(){
返回普拉塔米指数;
}
public int getDataTypeIndex(){
返回数据类型索引;
}
public int getNoOfRegister(){
返回noOfRegister;
}
公共图书馆
firstname,middlename,lastname
James,,Smith,
Michael,Jordan,Rose, 1/10/11
<person id="1">
<firstname>James</firstname>
<middlename></middlename>
<lastname>Smith</lastname>
</person>
<person id="2">
<firstname>Michael</firstname>
<lastname>Jordan</lastname>
<middlename>Rose</middlename>
</person>
<person id="1">
<firstname>James</firstname>
<middlename></middlename>
<lastname>Smith</lastname>
</person>
<person id="2">
<firstname>Michael</firstname>
<lastname>Rose</lastname>
</person>
<person id="1">
<firstname>James</firstname>
<middlename></middlename>
<lastname>Smith</lastname>
</person>
<person id="2">
<firstname>Michael</firstname>
<lastname>Rose</lastname>
<dob></dob>
</person>
firstname,middlename,lastname,dob
James,,Smith
Michael,,Rose,1/10/11
xls to csv convert
public class DeviceLibraryModel {
private String parameterName;
private String dataType;
private String noOfRegister;
private String address;
public String getParameterName() {
return parameterName;
}
public String getDataType() {
return dataType;
}
public String getNoOfRegister() {
return noOfRegister;
}
public String getAddress() {
return address;
}
public void setParameterName(String parameterName) {
this.parameterName = parameterName;
}
public void setDataType(String dataType) {
this.dataType = dataType;
}
public void setNoOfRegister(String noOfRegister) {
this.noOfRegister = noOfRegister;
}
public void setAddress(String address) {
this.address = address;
}
@Override
public String toString() {
return "DeviceLibraryModel{" + "ParameterName=" + parameterName + ", DataType=" + dataType + ", NoOfRegister=" + noOfRegister + ", Address=" + address + '}';
}
}
>
public class HeaderNameIndex {
private int pratameterNameIndex;
private int dataTypeIndex;
private int noOfRegister;
private int address;
public HeaderNameIndex(){
}
public int getPratameterNameIndex() {
return pratameterNameIndex;
}
public int getDataTypeIndex() {
return dataTypeIndex;
}
public int getNoOfRegister() {
return noOfRegister;
}
public int getAddress() {
return address;
}
public void setPratameterNameIndex(int pratameterNameIndex) {
this.pratameterNameIndex = pratameterNameIndex;
}
public void setDataTypeIndex(int dataTypeIndex) {
this.dataTypeIndex = dataTypeIndex;
}
public void setNoOfRegister(int noOfRegister) {
this.noOfRegister = noOfRegister;
}
public void setAddress(int address) {
this.address = address;
}
}
package model;
public interface HeaderNameInt {
String PARAMETERNAME="Parameter Name";
String DATATYPE="Data Type";
String NOOFREGISTER="No Of Register";
String ADDRESS="Address";
}
package services;
public class ReadFromXls extends HeaderNameIndex implements HeaderNameInt {
public List<DeviceLibraryModel> xlsConvert(String xlsPath) throws FileNotFoundException, IOException {
File file = new File(xlsPath);
FileInputStream fi = new FileInputStream(file);
List<DeviceLibraryModel> list = new ArrayList<>();
HeaderNameIndex objHeaderNameIndex = new HeaderNameIndex();
Workbook hw = new HSSFWorkbook(fi);
Sheet sheet = hw.getSheetAt(0);
Iterator<Row> rit = sheet.rowIterator();
int rowNumber = 0;
while (rit.hasNext()) {
Row next = rit.next();
DeviceLibraryModel dm = new DeviceLibraryModel();
Iterator<Cell> cit = next.cellIterator();
while (cit.hasNext()) {
Cell cellit = cit.next();
int iColumnIndex = cellit.getColumnIndex();
DataFormatter dataFormatter = new DataFormatter();//to get all string
String formatCellValue = dataFormatter.formatCellValue(cellit);
if (rowNumber == 0) {
switch (formatCellValue) {
case PARAMETERNAME:
objHeaderNameIndex.setPratameterNameIndex(iColumnIndex);
break;
case DATATYPE:
objHeaderNameIndex.setDataTypeIndex(iColumnIndex);
//System.err.println(objHeaderNameIndex.getDataTypeIndex());
break;
case NOOFREGISTER:
objHeaderNameIndex.setNoOfRegister(iColumnIndex);
//System.err.println(objHeaderNameIndex.getNoOfRegister());
break;
case ADDRESS:
objHeaderNameIndex.setAddress(iColumnIndex);
break;
default:
System.err.println("nothing");
}
}
if (rowNumber > 0) {
if(iColumnIndex == objHeaderNameIndex.getPratameterNameIndex())
dm.setParameterName(formatCellValue);
else if (iColumnIndex == objHeaderNameIndex.getDataTypeIndex())
dm.setDataType(formatCellValue);
else if (iColumnIndex == objHeaderNameIndex.getNoOfRegister())
dm.setNoOfRegister(formatCellValue);
else if (iColumnIndex == objHeaderNameIndex.getAddress())
dm.setAddress(formatCellValue);
}
}
if (rowNumber > 0) {
list.add(dm);
}
rowNumber++;
fi.close();
}
// System.err.println(list);
return list;
}
}
public class ConvertXlsToCsv {
public void toCsv() throws IOException{
ReadFromXls readXls=new ReadFromXls();
String xlsPath="C:\\Users\\admin\\Desktop\\Java Training\\Input file\\Device.xls";
List<DeviceLibraryModel> list = readXls.xlsConvert(xlsPath);
String sep=",";
String csvPath="C:\\Users\\admin\\Desktop\\Java Training\\Input file\\XlsToCsv.csv";
File file=new File(csvPath);
FileWriter writeData=new FileWriter(file,true);
for(DeviceLibraryModel dm:list)
{
if(file.exists())
{
String parameterName = dm.getParameterName();
writeData.append(parameterName+ '\n');
writeData.append(sep+dm.getDataType()+sep+ '\n');
writeData.append(dm.getNoOfRegister()+sep+ '\n');
writeData.append(dm.getAddress()+sep+ '\n');
}else
{
String parameterName = dm.getParameterName();
writeData.write(parameterName);
writeData.write(sep+dm.getDataType()+sep);
writeData.write(dm.getNoOfRegister()+sep);
writeData.write(dm.getAddress()+sep);
}
}
writeData.flush();
writeData.close();
}
}
`
public class ReadXls {
public static void main(String[] args) {
ConvertXlsToCsv convert=new ConvertXlsToCsv();
try {
convert.toCsv();
} catch (IOException ex) {
System.err.println(ex);
Logger.getLogger(ReadXls.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
`