Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/335.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在stanford nlp中查找命名实体的索引_Python_Stanford Nlp_Named Entity Recognition - Fatal编程技术网

Python 如何在stanford nlp中查找命名实体的索引

Python 如何在stanford nlp中查找命名实体的索引,python,stanford-nlp,named-entity-recognition,Python,Stanford Nlp,Named Entity Recognition,我正在为斯坦福nlp使用python包装器 查找命名实体的代码为: sentence = "Mr. Jhon was noted to have a cyst at his visit back in 2011." result = nlp.ner(sentence) for ne in result: if ne[1] == 'PERSON': print(ne) 输出为列表类型的结果: (u'Jhon',u'PERSON') 但它没有像spaCy或其他nlp工具那样给出命名

我正在为斯坦福nlp使用python包装器 查找命名实体的代码为:

sentence = "Mr. Jhon was noted to have a cyst at his visit back in 2011."
result = nlp.ner(sentence)

for ne in result:
  if ne[1] == 'PERSON':
     print(ne)
输出为列表类型的结果: (u'Jhon',u'PERSON')

但它没有像spaCy或其他nlp工具那样给出命名实体的索引,而是使用索引给出结果

>> namefinder = NameFinder.getNameFinder("spaCy")
>> entities = namefinder.find(sentences)
List(List((PERSON,0,13), (DURATION,15,27), (DATE,76,83)),
  List((PERSON,4,10),  (LOCATION,77,86), (ORGANIZATION,26,39)),
  List((PERSON,0,13), (DURATION,16,28), (ORGANIZATION,52,80)))

我正在为此使用
nltk
。我将答案改编自。关键点是调用use和方法
span\u tokenize()
生成一个单独的列表,我称之为
span
,该列表保留每个令牌的范围

from nltk.tag import StanfordNERTagger
from nltk.tokenize import WordPunctTokenizer

# Initialize Stanford NLP with the path to the model and the NER .jar
st = StanfordNERTagger(r"C:\stanford-corenlp\stanford-ner\classifiers\english.all.3class.distsim.crf.ser.gz",
       r"C:\stanford-corenlp\stanford-ner\stanford-ner.jar",
       encoding='utf-8')

sentence = "Mr. Jhon was noted to have a cyst at his visit back in 2011."

tokens = WordPunctTokenizer().tokenize(sentence)

# We have to compute the token spans in a separate list
# Notice that span_tokenize(sentence) returns a generator 
spans = list(WordPunctTokenizer().span_tokenize(sentence))

# enumerate will help us keep track of the token index in the token lists 
for i, ner in enumerate(st.tag(tokens)):
    if ner[1] == "PERSON":
        print spans[i], ner