用Java解析文本文件以获取字段的哈希映射
我试图解析多个文件,并将它们拆分为HashMap中的一组字段。这是一个样本文件用Java解析文本文件以获取字段的哈希映射,java,Java,我试图解析多个文件,并将它们拆分为HashMap中的一组字段。这是一个样本文件 COCONUT OIL CONTRACT TO CHANGE - DUTCH TRADERS ROTTERDAM, March 18 - Contract terms for trade in coconut oil are to be changed from long tons to tonnes with effect from the Aug/Sep contract onwards, Dutch
COCONUT OIL CONTRACT TO CHANGE - DUTCH TRADERS
ROTTERDAM, March 18 - Contract terms for trade in coconut
oil are to be changed from long tons to tonnes with effect from
the Aug/Sep contract onwards, Dutch vegetable oil traders said.
Operators have already started to take account of the
expected change and reported at least one trade in tonnes for
Aug/Sept shipment yesterday.
我需要该程序将该文档解析为自定义文档类中的字段,该类具有键、文件名、文件标题、位置、日期、作者、内容和类别
这就是我试图做的
public static Document parse(String filename) {
File f = new File(filename);
if (f.isFile()){
String fileId;
if (filename.indexOf(".") > 0) {
fileId = filename.substring(0, filename.lastIndexOf("."));
}
String category = f.getParent();
InputStream in = new FileInputStream(f);
byte buf[] = new byte[1024];
int len = in.read(buf);
while(len > 0){
..........
}
in.close();
}
return null;
}
以下代码可能会帮助您:
try {
FileInputStream fstream = new FileInputStream("myFile.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
StringBuffer contentBuffer = new StringBuffer();
String line = null;
boolean foundTitle = false;
boolean foundPlaceAndDate = false;
String date = "";
while ((line = br.readLine()) != null) {
if (line.matches("^[a-z-A-Z0-9].*") && !foundTitle) {
// If line starts with a letter or number and has no title yet, that's the title
System.out.println("Title: " + line);
foundTitle = true;
} else if (line.matches("^[\\ \t].*") && !foundPlaceAndDate) {
// If line starts with a space or tab and it's out first paragraph, then this paragraph has place and date
System.out.println("Place: " + line.trim().substring(0, line.trim().indexOf(",")));
date = line.trim().substring(line.trim().indexOf(",") + 1, line.trim().indexOf("-")).trim();
System.out.println("Date: " + date);
foundPlaceAndDate = true;
}
contentBuffer.append(line);
}
String content = contentBuffer.toString().substring(contentBuffer.toString().indexOf(date) + date.length() + 2).trim();
System.out.println("Content: " + content);
br.close();
fstream.close();
} catch (Exception e) {
System.err.println("Oh no! I got the following error: " + e.getMessage());
}
输出将是:
标题:椰子油合同将发生变化-荷兰贸易商
地点:鹿特丹
日期:3月18日
内容:荷兰植物油交易商表示,从8月/9月合同开始,椰油贸易的合同条款将从长吨改为吨。运营商已经开始考虑预期的变化,并在昨天报告了UG/9月装运的至少一个以吨为单位的贸易。很抱歉,您试图在这里实现什么欧威尔,这是一个开始,但很难以同样的方式继续下去。如果我是你,我现在就停止编写代码,首先要弄清楚需要采取哪些高级步骤。把步骤写在一张纸上<代码>1。将文件完全读入字符串。2.提取文件标题…等等。然后您可以开始一步一步地对其进行编码,在每一步之后测试结果。这确实让我开始了,但我需要将该文件解析为一个类似以下内容的文档类。公共类document{private HashMap map;public document(){map=new HashMap();}public void setField(FieldNames fn,String…o){map.put(fn,o);}public String[]getField(FieldNames fn){return map.get(fn);}}}现在需要做的就是填充文档类的字段。例如:
Document Document=new Document();Document.setField(“title”,title);