N-grams的CoreNLP API？_Nlp_Stanford Nlp_N Gram_Pos Tagger

N-grams的CoreNLP API？

nlp stanford-nlp

N-grams的CoreNLP API？,nlp,stanford-nlp,n-gram,pos-tagger,Nlp,Stanford Nlp,N Gram,Pos Tagger,CoreNLP是否有用于获取Unigram、Bigram、Trigram等的API 例如，我有一个字符串“我有最好的车”。我很想得到： I I have the the best car 基于我传递的字符串。您可以使用CoreNLP进行标记化，但要获取n-gram，请使用您正在使用的任何语言进行本机标记。比如说，如果您要将其导入Python，则可以使用列表切片和一些列表理解来将其拆分： >>> tokens ['I', 'have', 'the', 'best', 'car'

CoreNLP是否有用于获取Unigram、Bigram、Trigram等的API

例如，我有一个字符串

“我有最好的车”

。我很想得到：

I
I have
the
the best
car

基于我传递的字符串。

您可以使用CoreNLP进行标记化，但要获取n-gram，请使用您正在使用的任何语言进行本机标记。比如说，如果您要将其导入Python，则可以使用列表切片和一些列表理解来将其拆分：

>>> tokens
['I', 'have', 'the', 'best', 'car']
>>> unigrams = [tokens[i:i+1] for i,w in enumerate(tokens) if i+1 <= len(tokens)]
>>> bigrams = [tokens[i:i+2] for i,w in enumerate(tokens) if i+2 <= len(tokens)]
>>> trigrams = [tokens[i:i+3] for i,w in enumerate(tokens) if i+3 <= len(tokens)]
>>> unigrams
[['I'], ['have'], ['the'], ['best'], ['car']]
>>> bigrams
[['I', 'have'], ['have', 'the'], ['the', 'best'], ['best', 'car']]
>>> trigrams
[['I', 'have', 'the'], ['have', 'the', 'best'], ['the', 'best', 'car']]

>>令牌
[‘我’、‘有’、‘最好’、‘车’]
>>>Unigram=[tokens[i:i+1]表示i，w在enumerate（tokens）中如果i+1>>bigrams=[tokens[i:i+2]表示i，w在enumerate（tokens）中如果i+2>>trigrams=[tokens[i:i+3]表示i，w在enumerate（tokens）中如果i+3>>unigrams
[['I']、['have']、['the']、['best']、['car']]
>>>大人物
[['I'，'have']，['have'，'the']，['the'，'best']，['best'，'car']]
>>>三角图
[I]、[have]、[have]、[the]、[best]、[the]、[best]、[car]]

CoreNLP非常适合处理NLP的繁重任务，如依赖项、coref、词性标记等。如果你只是想标记化，比如带消防车去参加水枪大战，那就太过分了。使用类似的东西同样可以满足你标记化的需要。

你可以使用CoreNLP来标记化，但要抓取n-gram，就用nat无论您使用哪种语言，都可以使用这种方式。如果您要将其导入Python，则可以使用列表切片和一些列表理解来将其拆分：

>>> tokens
['I', 'have', 'the', 'best', 'car']
>>> unigrams = [tokens[i:i+1] for i,w in enumerate(tokens) if i+1 <= len(tokens)]
>>> bigrams = [tokens[i:i+2] for i,w in enumerate(tokens) if i+2 <= len(tokens)]
>>> trigrams = [tokens[i:i+3] for i,w in enumerate(tokens) if i+3 <= len(tokens)]
>>> unigrams
[['I'], ['have'], ['the'], ['best'], ['car']]
>>> bigrams
[['I', 'have'], ['have', 'the'], ['the', 'best'], ['best', 'car']]
>>> trigrams
[['I', 'have', 'the'], ['have', 'the', 'best'], ['the', 'best', 'car']]

>>令牌
[‘我’、‘有’、‘最好’、‘车’]
>>>Unigram=[tokens[i:i+1]表示i，w在enumerate（tokens）中如果i+1>>bigrams=[tokens[i:i+2]表示i，w在enumerate（tokens）中如果i+2>>trigrams=[tokens[i:i+3]表示i，w在enumerate（tokens）中如果i+3>>unigrams
[['I']、['have']、['the']、['best']、['car']]
>>>大人物
[['I'，'have']，['have'，'the']，['the'，'best']，['best'，'car']]
>>>三角图
[I]、[have]、[have]、[the]、[best]、[the]、[best]、[car]]

CoreNLP非常适合处理NLP的繁重工作，如依赖项、coref、词性标记等。如果你只是想标记化，比如带消防车去参加水枪大战，那就太过分了。使用类似的东西同样可以满足你对标记化的需要。

如果你是用Java编写代码，请查看Str中的getNgrams*函数CoreNLP中的ingUtils类

您还可以使用CollectionUtils.getNgrams（StringUtils类也使用它）

如果您是用Java编写代码的，请在CoreNLP中查看StringUtils类中的getNgrams*函数

您还可以使用CollectionUtils.getNgrams（StringUtils类也使用它）

谢谢Sonal。这很有帮助：）我试着在斯坦福管道中使用getNgrams*，但越来越难的是将它从集合中输送回列表。我最终遇到了一个错误

java.lang.String无法转换到edu.stanford.nlp.ling.CoreLabel

你们是如何克服这个问题的？签出CollectionUtils.getNgrams.Thanks Sonal.that helps:）我尝试在斯坦福管道中使用getNgrams*，但变得困难的是将其从集合中返回到列表。我最终遇到了一个错误

java.lang.String无法转换到edu.stanford.nlp.ling.CoreLabel

你们是如何克服这个问题的？签出CollectionUtils.getNgrams。