Bigdata 在GATE中使用TermRaider插件
我想在GATE中使用TermRaider功能。有人可以发布一些示例代码来加载和使用java类中的资源吗。我尝试过以下方法,但失败了Bigdata 在GATE中使用TermRaider插件,bigdata,data-extraction,gate,Bigdata,Data Extraction,Gate,我想在GATE中使用TermRaider功能。有人可以发布一些示例代码来加载和使用java类中的资源吗。我尝试过以下方法,但失败了 Gate.getCreoleRegister().registerDirectories(new URL("file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/TermRaider")); ProcessingResource termRaider = (ProcessingResource)
Gate.getCreoleRegister().registerDirectories(new URL("file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/TermRaider"));
ProcessingResource termRaider = (ProcessingResource) Factory.
createResource("gate.termraider.TermRaiderEnglish",Factory.newFeatureMap());
Exception:
gate.termraider.TermRaiderEnglish cannot be cast to gate.ProcessingResource
任何人都可以建议我应该如何继续。TermRaider系统不是一个单独的PR,它是一个完整的应用程序(实际上是一个Groovy
ScriptableController
)。TermraiderEnglish
资源只是一个钩子,用于使该应用程序出现在GATE Developer GUI的“现成应用程序”菜单中
在嵌入式代码中,您可以使用PersistenceManager
File termRaiderPlugin = new File(Gate.getPluginsHome(), "TermRaider");
File gappFile = new File(new File(termRaiderPlugin, "applications"),
"termraider-eng.gapp");
CorpusController trApp = (CorpusController)PersistenceManager.loadObjectFromFile(
gappFile);
当您在语料库上运行应用程序时,它会创建三个“术语库”LR的新实例,其中包含有关新发现术语的信息。香草应用程序实际上是为GUI而不是嵌入式使用而设计的,因此它不会在任何有用的地方存储对这些新LRs的引用-您必须查询CreoleRegister
才能找到它们。您可能更愿意创建自己的应用程序副本,并调整控制脚本,将termbank实例作为(比如)功能存储在语料库中,方法是添加以下内容
corpus.features.tfidfTermbank = termbank0
corpus.features.annotationTermbank = termbank1
corpus.features.hyponymyTermbank = termbank2
到控制脚本的末尾。然后,您可以通过corpus.getFeatures().get(“TFIDTermBank”)
等在Java代码中访问它们
由于这些Termbank类本身是TermRaider
插件的一部分,您可能希望将gate TermRaider.jar
添加到主应用程序类路径,而不是通过GateClassLoader
导入gate.Corpus;
import gate.Corpus;
import gate.CorpusController;
import gate.Document;
import gate.Factory;
import gate.FeatureMap;
import gate.Gate;
import gate.termraider.bank.AbstractTermbank;
import gate.termraider.output.CsvGenerator;
import gate.util.GateException;
import gate.util.Out;
import gate.util.persistence.PersistenceManager;
import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.net.URLDecoder;
public class termraider {
public static void main(String[] args) throws IOException, GateException {
// initialise the GATE library
Out.prln("Initialising GATE...");
Gate.init();
// Initialize GATE
File gateHome = Gate.getGateHome();
Out.prln("...GATE initialised");
//Load TermRaider plugin
File termRaiderPlugin = new File(Gate.getPluginsHome(), "TermRaider");
File gappFile = new File(new File(termRaiderPlugin, "applications"),
"termraider-eng.gapp");
CorpusController trApp = (CorpusController)PersistenceManager.loadObjectFromFile(gappFile);
System.out.println("TermRaider loaded successfully!!!");
//Loading txt files from a folder path
Corpus corpus = (Corpus) Factory.createResource("gate.corpora.CorpusImpl");
//String dirname = "Desktop/Gate_corpus/About Us/New Folder";
String dirname = "Desktop/GermanHPFCompetition/termRaider";
File f1 = new File(dirname);
String s[] = f1.list();
for (int i=0; i < s.length; i++) {
String path = dirname + "/" + s[i];
path = URLDecoder.decode(path, "utf-8");
path = new File(path).getPath();
URL u=new URL("file:\\\\\\"+path);
FeatureMap params = Factory.newFeatureMap();
params.put("sourceUrl", u);
params.put("preserveOriginalContent", new Boolean(true));
params.put("collectRepositioningInfo", new Boolean(true));
//Out.prln("Creating doc for " + u);
Document doc = (Document)
Factory.createResource("gate.corpora.DocumentImpl", params);
corpus.add(doc);
} // for each file in the folder
//running TermRaider plugin with the corpus
trApp.init();
trApp.setCorpus(corpus);
trApp.execute();
Corpus output_corpus = (Corpus) Factory.createResource("gate.corpora.CorpusImpl");
output_corpus=trApp.getCorpus();
System.out.println("TermRaider executed successfully!!!");
//Creating csv files as output
AbstractTermbank tb1 = (AbstractTermbank) output_corpus.getFeatures().get("tfidfTermbank");
AbstractTermbank tb2 = (AbstractTermbank) output_corpus.getFeatures().get("hyponymyTermbank");
AbstractTermbank tb3 = (AbstractTermbank) output_corpus.getFeatures().get("annotationTermbank");
System.out.println(tb1);
System.out.println(tb2);
System.out.println(tb3);
CsvGenerator generator = new CsvGenerator();
File outputFile1 = new File("Desktop/GermanHPFCompetition/termRaider/tfidfTermbank.csv");
File outputFile2 = new File("Desktop/GermanHPFCompetition/termRaider/hyponymyTermbank.csv");
File outputFile3 = new File("Desktop/GermanHPFCompeti`enter code here`tion/termRaider/annotationTermbank.csv");
double threshold1 = 0;
double threshold2 = 0;
double threshold3 = 0;
generator.generateAndSaveCsv(tb1, threshold1, outputFile1);
generator.generateAndSaveCsv(tb2, threshold2, outputFile2);
generator.generateAndSaveCsv(tb3, threshold3, outputFile3);
System.out.println("CSV files created!!!");
}//end of main
}//end of class
导入gate.corpus控制器;
进口门.文件;
进口门。工厂;
导入gate.FeatureMap;
进口大门;
导入gate.termraider.bank.AbstractTermbank;
导入gate.termraider.output.CsvGenerator;
导入gate.util.GateException;
导入gate.util.Out;
导入gate.util.persistence.PersistenceManager;
导入java.io.File;
导入java.io.IOException;
导入java.net.URL;
导入java.net.url解码器;
公共类术语攻击者{
公共静态void main(字符串[]args)引发IOException、GateException{
//初始化GATE库
Out.prln(“初始化门…”);
Gate.init();
//初始化门
文件gateHome=Gate.getGateHome();
Out.prln(“…门初始化”);
//加载TermRaider插件
File termRaiderPlugin=新文件(Gate.getPluginsHome(),“TermRaider”);
File gappFile=新文件(新文件(termRaiderPlugin,“应用程序”),
“termraider-eng.gapp”);
CorpusController trApp=(CorpusController)PersistenceManager.loadObjectFromFile(gappFile);
System.out.println(“TermRaider加载成功!!!”;
//从文件夹路径加载txt文件
语料库=(语料库)Factory.createResource(“gate.corpora.CorpusImpl”);
//String dirname=“Desktop/Gate\u corpus/About Us/New Folder”;
String dirname=“桌面/germanhpfiction/termRaider”;
文件f1=新文件(目录名);
字符串s[]=f1.list();
对于(int i=0;i
请,至少有人提供一个示例链接。