Java 我们如何处理大型GATE文档_Java_Annotations_Gate

Java 我们如何处理大型GATE文档

java

Java 我们如何处理大型GATE文档,java,annotations,gate,Java,Annotations,Gate,当我尝试执行管道时，如果我使用的GATE文档稍大，我会得到错误java.lang.OutOfMemoryError:GC开销限制超过了如果GATE文档很小，那么代码可以正常工作我的JAVA代码如下所示： TestGate类： public void gateProcessor(Section section) throws Exception { Gate.init(); Gate.getCreoleRegister(

当我尝试执行

管道时，如果我使用的GATE文档稍大，我会得到错误java.lang.OutOfMemoryError:GC开销限制超过了

如果GATE文档很小，那么代码可以正常工作
我的JAVA代码如下所示：
TestGate类：
    public void gateProcessor(Section section) throws Exception { 
                Gate.init();
                Gate.getCreoleRegister().registerDirectories(....
                SerialAnalyserController pipeline .......
                pipeline.add(All the language analyzers)
                pipeline.add(My Jape File)
                Corpus corpus = Factory.newCorpus("Gate Corpus");
                Document doc = Factory.newDocument(section.getContent());
                corpus.add(doc);

                pipeline.setCorpus(corpus);
                pipeline.execute();
}

            StringBuilder body = new StringBuilder();
            int character;
            FileInputStream file = new FileInputStream(
                    new File(
                            "filepath\\out.rtf"));  //The Document in question
            while (true)
            {
                character = file.read();
                if (character == -1) break;
                body.append((char) character);
            }


            Section section = new Section(body.toString()); //Creating object of Type Section with content field = body.toString()
            TestGate testgate = new TestGate();
            testgate.gateProcessor(section);

主类包含：
    public void gateProcessor(Section section) throws Exception { 
                Gate.init();
                Gate.getCreoleRegister().registerDirectories(....
                SerialAnalyserController pipeline .......
                pipeline.add(All the language analyzers)
                pipeline.add(My Jape File)
                Corpus corpus = Factory.newCorpus("Gate Corpus");
                Document doc = Factory.newDocument(section.getContent());
                corpus.add(doc);

                pipeline.setCorpus(corpus);
                pipeline.execute();
}

            StringBuilder body = new StringBuilder();
            int character;
            FileInputStream file = new FileInputStream(
                    new File(
                            "filepath\\out.rtf"));  //The Document in question
            while (true)
            {
                character = file.read();
                if (character == -1) break;
                body.append((char) character);
            }


            Section section = new Section(body.toString()); //Creating object of Type Section with content field = body.toString()
            TestGate testgate = new TestGate();
            testgate.gateProcessor(section);

有趣的是，在GATE Developer工具中，这一点失败了。如果文档超过了特定的限制，比如超过1页，那么这些工具基本上就会卡住
这证明了我的代码在逻辑上是正确的，但我的方法是错误的。我们如何处理GATE文档中的大块数据。
每次创建管道对象时，它都会占用大量内存。这就是为什么每次你用“安妮”清洁剂
pipeline.cleanup（）；
管道=空
 你需要打电话
corpus.clear();
Factory.deleteResource(doc);

在每个文档之后，否则，如果您运行足够多的时间，您最终将在任何大小的文档上退出内存（尽管通过在方法中初始化gate的方式，您似乎真的只需要处理单个文档一次）
除此之外，注释和特性通常占用大量内存。如果您有一个注释密集型管道，也就是说，您生成了大量带有大量功能和值的注释，您可能会耗尽内存。确保您没有以指数方式生成注释的处理资源—例如，jape或groovy生成n到W注释的幂，其中W是文档中的字数。或者，如果您的文档中每个可能的单词组合都有一个功能，这将生成W字符串的阶乘。
您的文档/文件有多大（多少MB），例如用于out.rtf
）以及您的java堆设置是什么（您使用的是java-Xmx1g吗）？另请参见OutOfMemoryError
相关问题，例如。