Python StanfordCoreNLP英语upos注释
我想使用CoreNLPClient来提取带有uPOS注释的依赖项解析器 现在,我的代码是:Python StanfordCoreNLP英语upos注释,python,stanford-nlp,Python,Stanford Nlp,我想使用CoreNLPClient来提取带有uPOS注释的依赖项解析器 现在,我的代码是: def query_NLP_server(my_text, to_print=False): ''' Query the NLP server to tokenize and tag my_text, and do some process to return nice my_tokens :param my_text (string): The sentence we want
def query_NLP_server(my_text, to_print=False):
'''
Query the NLP server to tokenize and tag my_text, and do some process to return nice my_tokens
:param my_text (string): The sentence we want to extract the token and the tags
:param to_print (boolean): Option to print the resulted tokens extracted from NLP server
:return: my_tokens (list of list of tuples): The tokens with tags extracted from my_text
'''
# 1- Ask the query to the NLP Server
with CoreNLPClient(annotators=['tokenize', 'ssplit', 'pos', 'parse'],
timeout=30000,
output_format="json",
properties={'tokenize.language': 'en'}
) as client:
ann = client.annotate(my_text)
# 2- Process the output of the NLP Server to have a nice token list
output = ann['sentences'][0]['parse']
tree = ParentedTree.fromstring(output)
my_tokens = []
try:
for subtree in tree.subtrees(filter=lambda t: t[0].parent().label() == 'ROOT'):
for subtree2 in subtree:
my_tokens.append(subtree2.pos())
except: # when it is finish (the exception happen when it is ok for us)
if to_print:
print('The tokens extracted from NLP Server are :\n', my_tokens, '\n')
return my_tokens
我得到的结果是:
[['I'、'PRP'、'am'、'VBP'、'looking'、'VBG'、'for'、'IN',
‘儿童’、‘NNS’、‘with’、‘IN’、‘Gigivivitus’、‘NN’、’,
“.]]
但我希望使用UPO而不是XPO:这似乎是有可能的,就像这里解释的管道:
我已经通过以下代码成功使用了法国车型:
with CoreNLPClient(annotators=['tokenize', 'ssplit', 'pos', 'parse'],
timeout=30000,
output_format="json",
properties={'tokenize.language': 'en',
'pos.model': 'edu/stanford/nlp/models/pos-tagger/french/french-ud.tagger',
'parse.model': 'edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz'}
) as client:
ann = client.annotate(my_text)
但我不明白为什么英语中的“基本”模型不返回upos。。。
有没有办法通过StanfordCoreNLP客户端以英语获取UPO?目前还没有,我们还没有为Java Stanford Corenlp培训一名使用该标记集的英语词性标记员。我将把它添加到待办事项列表中。我可以尝试今天或很快训练一个模型并更新模型。如果您使用最新型号的jar,您可以访问英文UD型号。训练这些模型不需要很长时间/太多的努力。坦克适合你的anwser!如果你能做到,那就太好了!对不起,我不太熟悉斯坦福德NLP。你说的UD在英语UD模型中是什么意思?我指的是UD2.0的UPOS。