File 在opennlp中训练自己的模型_File_Model_Opennlp

File 在opennlp中训练自己的模型

file model

File 在opennlp中训练自己的模型,file,model,opennlp,File,Model,Opennlp,我发现很难创建自己的openNLP模型。谁能告诉我，如何拥有自己的模型。如何进行培训输入应该是什么以及输出模型文件将存储在哪里。也许本文将帮助您解决这个问题。它描述了如何从维基百科提取的数据中进行标记名查找器培训这个网站非常有用，在代码中显示，并使用OpenNLP应用程序训练所有不同类型的模型，如实体提取和词性等我可以在这里给你一些代码示例，但是页面使用起来非常清晰理论方面：基本上，您创建了一个文件，其中列出了您想要培训的内容例如体育[空白]这是一个关于足球、橄榄球等的网

我发现很难创建自己的openNLP模型。谁能告诉我，如何拥有自己的模型。如何进行培训

输入应该是什么以及输出模型文件将存储在哪里。

也许本文将帮助您解决这个问题。它描述了如何从维基百科提取的数据中进行标记名查找器培训

这个网站非常有用，在代码中显示，并使用OpenNLP应用程序训练所有不同类型的模型，如实体提取和词性等
我可以在这里给你一些代码示例，但是页面使用起来非常清晰
理论方面：
基本上，您创建了一个文件，其中列出了您想要培训的内容
例如
体育[空白]这是一个关于足球、橄榄球等的网页
政治[空白]这是一个关于托尼·布莱尔担任首相的页面

上述页面描述了格式（每个型号需要不同的格式）。创建此文件后，通过API或opennlp应用程序（通过命令行）运行它，它将生成一个.bin文件。一旦你有了这个.bin文件，你就可以将它加载到一个模型中，并开始使用它（根据上面网站中的api）。
首先你需要用所需的实体来训练数据
句子应以新行字符（\n）分隔。应使用空格字符将值与标记分隔。
假设您想要创建医学实体模型，那么数据应该是这样的：

<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines - <START:medicine> amoxicillin trihydrate <END> and <START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections.

Augmentin Duo是一种青霉素抗生素，含有两种药物-三水合阿莫西林和阿莫西林克拉维酸钾。它们共同作用杀死某些类型的细菌，并用于治疗某些类型的细菌感染。
例如，您可以参考一个示例。培训数据应至少包含15000个句子，以获得更好的结果
此外，您还可以使用Opennlp TokenNameFinderTrainer。输出文件将采用.bin格式
以下是一个例子：

有关更多详细信息，请参阅
复制数据中的数据并运行下面的代码以获取您自己的mymodel.bin
你可以参考资料=

公共课堂培训{ 静态字符串onlpModelPath=“mymodel.bin”； //训练数据集静态字符串trainingDataFilePath=“data.txt”；公共静态void main（字符串[]args）引发IOException{ Charset Charset=Charset.forName（“UTF-8”）； ObjectStream lineStream=新的明文ByLineStream( 新文件输入流（trainingDataFilePath），字符集； ObjectStream sampleStream=新名称采样数据流( 线状流）； TokenNameFinderModel model=null； HashMap mp=新的HashMap（）；试一试{ //model=NameFinderME.train（“en”，“drugs”，sampleStream，Collections.emptyMap（），100,4）； model=NameFinderME.train（“en”，“drugs”，sampleStream，Collections.emptyMap（））； }最后{ sampleStream.close（）； } BufferedOutputStream modelOut=null；试一试{ modelOut=newbufferedoutputstream（newfileoutputstream（onlpModelPath））；序列化（modelOut）； }最后{ if（modelOut！=null） modelOut.close（）； } } }
您为哪个工具创建模型？欢迎使用堆栈溢出！虽然此代码可能有助于解决问题，但它没有解释为什么和/或如何回答问题。提供这种额外的环境将大大提高其长期教育价值。请在回答中添加解释，包括适用的限制和假设。或者可以使用RTFM来节省您的打字时间。让我为您指出最新的文档，网址为
public class Training { static String onlpModelPath = "mymodel.bin"; // training data set static String trainingDataFilePath = "data.txt"; public static void main(String[] args) throws IOException { Charset charset = Charset.forName("UTF-8"); ObjectStream<String> lineStream = new PlainTextByLineStream( new FileInputStream(trainingDataFilePath), charset); ObjectStream<NameSample> sampleStream = new NameSampleDataStream( lineStream); TokenNameFinderModel model = null; HashMap<String, Object> mp = new HashMap<String, Object>(); try { // model = NameFinderME.train("en","drugs", sampleStream, Collections.<String,Object>emptyMap(),100,4) ; model= NameFinderME.train("en", "drugs", sampleStream, Collections. emptyMap()); } finally { sampleStream.close(); } BufferedOutputStream modelOut = null; try { modelOut = new BufferedOutputStream(new FileOutputStream(onlpModelPath)); model.serialize(modelOut); } finally { if (modelOut != null) modelOut.close(); } } }