Java OpenNLP句子训练示例
我试图使用官方OpenNLP网站手册示例来训练新模型,以下是示例:Java OpenNLP句子训练示例,java,opennlp,training-data,sentence,Java,Opennlp,Training Data,Sentence,我试图使用官方OpenNLP网站手册示例来训练新模型,以下是示例: Charset charset = Charset.forName("UTF-8"); ObjectStream lineStream = new PlainTextByLineStream(new FileInputStream("en-sent.train"), charset); ObjectStream sampleStream = new SentenceSampleStream(lineSt
Charset charset = Charset.forName("UTF-8");
ObjectStream lineStream = new PlainTextByLineStream(new FileInputStream("en-sent.train"), charset);
ObjectStream sampleStream = new SentenceSampleStream(lineStream);
SentenceModel model;
try {
model = SentenceDetectorME.train("en", sampleStream, true, null, TrainingParameters.defaultParams());
} finally {
sampleStream.close();
}
OutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
model.serialize(modelOut);
} finally {
if (modelOut != null)
modelOut.close();
}
问题出在2º线上
ObjectStream lineStream = new PlainTextByLineStream(new FileInputStream("en-sent.train"), charset);
帮助告诉我:
不赞成。改用PlainTextByLineStream(InputStreamFactory,字符集)。
但是我不知道如何使用这个构造函数。我想举一个例子,使用相同的语料库文件使用这个未弃用的构造函数
我已经编写了下一段代码,使用opennlp帮助和2种使用train方法的方法,不推荐的和建议的文档帮助:
Charset charset = Charset.forName("UTF-8");
InputStreamFactory inputStreamFactory=null;
ObjectStream<String> lineStream=null;
ObjectStream<SentenceSample> sampleStream=null;
SentenceModel model=null;
OutputStream modelOut = null;
try{
inputStreamFactory=InputStreamFactory.class.newInstance();
lineStream=new PlainTextByLineStream(inputStreamFactory,charset);
sampleStream = new SentenceSampleStream(lineStream);
//The deprecated:
model = SentenceDetectorME.train("en", sampleStream, true, null, TrainingParameters.defaultParams());
//The sugested:
model = SentenceDetectorME.train("en", sampleStream, new SentenceDetectorFactory(), new TrainingParameters());
} catch (InstantiationException e2){
e2.printStackTrace();
} catch (IllegalAccessException e2){
e2.printStackTrace();
} catch (IOException e){
e.printStackTrace();
}finally {
try{
sampleStream.close();
} catch (IOException e){
e.printStackTrace();
}
}
try {
modelOut = new BufferedOutputStream(new FileOutputStream(new File("modelFile")));
model.serialize(modelOut);
} catch (FileNotFoundException e){
e.printStackTrace();
} catch (IOException e){
e.printStackTrace();
} finally {
if (modelOut != null) try{
modelOut.close();
} catch (IOException e){
e.printStackTrace();
}
}
Charset Charset=Charset.forName(“UTF-8”);
InputStreamFactory InputStreamFactory=null;
ObjectStream lineStream=null;
ObjectStream sampleStream=null;
SentenceModel model=null;
OutputStream modelOut=null;
试一试{
inputStreamFactory=inputStreamFactory.class.newInstance();
lineStream=新的明文bylinestream(inputStreamFactory,字符集);
sampleStream=新句子sampleStream(lineStream);
//反对者:
model=SentenceDetectorME.train(“en”,sampleStream,true,null,TrainingParameters.defaultParams());
//建议如下:
模型=SentenceDetectorME.train(“en”,样本流,新SentenceDetectorFactory(),新培训参数());
}捕获(实例化异常e2){
e2.printStackTrace();
}捕获(非法访问异常e2){
e2.printStackTrace();
}捕获(IOE异常){
e、 printStackTrace();
}最后{
试一试{
sampleStream.close();
}捕获(IOE异常){
e、 printStackTrace();
}
}
试一试{
modelOut=new BufferedOutputStream(new FileOutputStream(新文件(“modelFile”));
序列化(modelOut);
}catch(filenotfounde异常){
e、 printStackTrace();
}捕获(IOE异常){
e、 printStackTrace();
}最后{
如果(modelOut!=null),请尝试{
modelOut.close();
}捕获(IOE异常){
e、 printStackTrace();
}
}
但在这段新代码中,我不知道从哪里获得语料库数据文件。
有什么想法吗?您必须使用所需的数据文件初始化
inputStreamFactory
,使用
inputStreamFactory = new MarkableFileInputStreamFactory(
new File("en-sent.train"));