获取TweetNLP中首字母缩略词的全文

获取TweetNLP中首字母缩略词的全文,nlp,stanford-nlp,tweets,Nlp,Stanford Nlp,Tweets,为tweet提供标记器和词性标记器,这非常酷。现在,我想知道我是否可以进一步提取首字母缩略词。例如,当我收到一条tweet“ikr”时,我可以查找它并得到“我知道,对吗?”。我想我可以写我自己的字典,但似乎已经有一本了?我不知道有这样的语料库;但您可以从以下网站获取所需信息: 所以我最后要做的就是将StanfordNLP与GATE tweeter模型结合使用 示例推文: ikr smh他问了fir yo的姓,这样他就可以在fb LOLOL上添加u了 没有gate-EN-twitter.model

为tweet提供标记器和词性标记器,这非常酷。现在,我想知道我是否可以进一步提取首字母缩略词。例如,当我收到一条tweet“ikr”时,我可以查找它并得到“我知道,对吗?”。我想我可以写我自己的字典,但似乎已经有一本了?

我不知道有这样的语料库;但您可以从以下网站获取所需信息:

所以我最后要做的就是将StanfordNLP与GATE tweeter模型结合使用

示例推文:

ikr smh他问了fir yo的姓,这样他就可以在fb LOLOL上添加u了

没有gate-EN-twitter.model的结果

word: ikr :: pos: NN :: ne:O
word: smh :: pos: NN :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: NNP :: ne:O
word: yo :: pos: NNP :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: NN :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NN :: ne:O
word: lololol :: pos: NN :: ne:O
word: ikr :: pos: UH :: ne:O
word: smh :: pos: UH :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: IN :: ne:O
word: yo :: pos: PRP$ :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: PRP :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NNP :: ne:O
word: lololol :: pos: UH :: ne:O
gate-EN-twitter.model的结果

word: ikr :: pos: NN :: ne:O
word: smh :: pos: NN :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: NNP :: ne:O
word: yo :: pos: NNP :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: NN :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NN :: ne:O
word: lololol :: pos: NN :: ne:O
word: ikr :: pos: UH :: ne:O
word: smh :: pos: UH :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: IN :: ne:O
word: yo :: pos: PRP$ :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: PRP :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NNP :: ne:O
word: lololol :: pos: UH :: ne:O
现在,我可以通过查看标签来识别俚语,并与我的自定义词典相反


仍然不明白为什么它现在还不可用,但它解决了我目前的问题。

从他们的网站下载StanfordNLP或将其用作maven依赖项。我使用了3.1.1版本

    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.3.1</version>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.3.1</version>
        <classifier>models</classifier>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-parser</artifactId>
        <version>3.3.1</version>
        <classifier>models</classifier>
    </dependency>

然后运行POS

谢谢Daniel。这是我的后备计划。我会有一本这样的术语词典,如果TweetNLP非常自信地显示它只是解析了一个感叹词,我会访问我的词典查找:LOL,然后将其替换为大声大笑。如果找不到,我会把它记录下来,然后手动尝试破译它是什么,并将它添加到字典中。也许我可以为ApacheOpenNLP做这件事,如果这个功能目前还不可用,我可以作为一个插件来做贡献……请参阅下面的完整答案