Java 基于uimaFIT代码的字典示例_Java_Uima

Java 基于uimaFIT代码的字典示例

java

Java 基于uimaFIT代码的字典示例,java,uima,Java,Uima,我正在看一看，我只是有相当多的困难，以增加一个这是我迄今为止最好的一次闭嘴： public class LocationAnnotator extends JCasAnnotator_ImplBase { public static final String RES_DICTIONARY = "dictionary"; @ExternalResource(key = RES_DICTIONARY) private DataResource resource;

我正在看一看，我只是有相当多的困难，以增加一个

这是我迄今为止最好的一次闭嘴：

public class LocationAnnotator extends JCasAnnotator_ImplBase {

    public static final String RES_DICTIONARY = "dictionary";

    @ExternalResource(key = RES_DICTIONARY)
    private DataResource resource;
    private Dictionary dictionary;

    @Override
    public void initialize(UimaContext context) throws ResourceInitializationException {
        super.initialize(context);
        try {
            DictionaryBuilder dictBuilder = new HashMapDictionaryBuilder();
            // create dictionary file parser
            DictionaryFileParserImpl fileParser = new DictionaryFileParserImpl();
            fileParser.parseDictionaryFile(resource.getUri().getPath(), resource.getInputStream(), dictBuilder);
            dictionary = dictBuilder.getDictionary();
        } catch (IOException e) {
            throw new ResourceInitializationException();
        }
    }

    @Override
    public void process(JCas cas) throws AnalysisEngineProcessException {
        String docText = cas.getDocumentText();
        for (String line : docText.split("\n")) {
            for (String word : line.split(" ")) {
                if (dictionary.contains(word)) {
                    int pos = docText.indexOf(word);
                    Location annotation = new Location(cas, pos, pos + word.length());
                    annotation.addToIndexes();
                }
            }
        }

    }
}

我是这样执行引擎的：

CollectionReaderDescription reader = CollectionReaderFactory.createReaderDescription(CvReader.class, CvReader.PARAM_INPUT_FILE, "docs/simple-doc.txt");

AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(LocationAnnotator.class);
ExternalResourceFactory.bindResource(tokenizer, LocationAnnotator.RES_DICTIONARY, "META-INF/dictionaries/location.dict.xml");

for (JCas cas : SimplePipeline.iteratePipeline(reader, tokenizer)) {
    for (Location location : JCasUtil.select(cas, Location.class)) {
        System.out.println("Found location: " + location.getCoveredText());
    }
}

没有比这更优雅的方式了吗？不喜欢初始化。将使用注释作为

@ExternalResource

初始化字典

如果有人能给我举个更简单的例子，我会很高兴的。。谢谢

只有在a）继承自uimaFIT版本的JCasAnnotator_ImplBase，b）在重写派生类中的initialize（…）方法时调用super.initialize（…）时，@ExternalResource之类的uimaFIT注释才起作用。谢谢！我在上面的代码中调整了b）。看起来更好，但直接实例化是不可能的，我想？你说的“直接实例化”是什么意思？我希望目录也存在类似于

@ExternalResource

的注释，这样我就不需要自己在

初始化（UimaContext上下文）中初始化它了

方法。这是如何实现外部资源的问题。可以实现更智能的外部资源，例如，实现作为外部资源的字典，并在其中实现加载逻辑。这使得外部资源的代码更加复杂，但是组件代码变得更加紧凑。