Time 如何使用NLTK获取时间和日期或特定产品名称?

Time 如何使用NLTK获取时间和日期或特定产品名称?,time,nltk,tagged-corpus,Time,Nltk,Tagged Corpus,当你加工和取出卡盘的时候,我看到我们只有 [('Andrew','PERSON'),('Chinese','GPE'),('American','GPE'),('Baidu','ORGANIZATION'),('company's Artificial Intelligence Group','ORGANIZATION'),('Stanford University','ORGANIZATION'),('Coursera','ORGANIZATION'),('Andrew','PERSON'),

当你加工和取出卡盘的时候,我看到我们只有 [('Andrew','PERSON'),('Chinese','GPE'),('American','GPE'),('Baidu','ORGANIZATION'),('company's Artificial Intelligence Group','ORGANIZATION'),('Stanford University','ORGANIZATION'),('Coursera','ORGANIZATION'),('Andrew','PERSON'),('UK','ORGANIZATION'),('HongKong','GPE')]

我也需要知道时间和日期? 请建议。。。
谢谢。

您需要一个更复杂的标记器,比如斯坦福大学命名的实体标记器。安装和配置后,您可以运行它:

doc = '''Andrew Yan-Tak Ng is a Chinese American computer scientist.He is the former chief scientist at Baidu, where he led the company's
Artificial Intelligence Group. He is an adjunct professor (formerly associate professor) at Stanford University. Ng is also the co-founder
and chairman at Coursera, an online education platform. Andrew was born in the UK on 27th Sep 2.30pm 1976. His parents were both from Hong Kong.'''

# tokenize doc
tokenized_doc = nltk.word_tokenize (doc)

# tag sentences and use nltk's Named Entity Chunker
tagged_sentences = nltk.pos_tag (tokenized_doc)
ne_chunked_sents = nltk.ne_chunk (tagged_sentences)
其中输出为:

from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize

stanfordClassifier = '/path/to/classifier/classifiers/english.muc.7class.distsim.crf.ser.gz'
stanfordNerPath = '/path/to/jar/stanford-ner/stanford-ner.jar'

st = StanfordNERTagger(stanfordClassifier, stanfordNerPath, encoding='utf8')

doc = '''Andrew Yan-Tak Ng is a Chinese American computer scientist.He is the former chief scientist at Baidu, where he led the company's Artificial Intelligence Group. He is an adjunct professor (formerly associate professor) at Stanford University. Ng is also the co-founder and chairman at Coursera, an online education platform. Andrew was born in the UK on 27th Sep 2.30pm 1976. His parents were both from Hong Kong.'''

result = st.tag(word_tokenize(doc))

date_word_tags = [wt for wt in result if wt[1] == 'DATE' or wt[1] == 'ORGANIZATION']

print date_word_tags
您可能会遇到一些问题时,试图安装和设置一切,但我认为这是值得的麻烦


如果有帮助,请告诉我。

很抱歉耽搁了……我不在上班。如上所述,请试用Stanfordnertager和您的代码片段。与NLTK不同,该解决方案运行良好。谢谢你,祝你度过愉快的一天!
[(u'Artificial', u'ORGANIZATION'), (u'Intelligence', u'ORGANIZATION'), (u'Group', u'ORGANIZATION'), (u'Stanford', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'Coursera', u'ORGANIZATION'), (u'27th', u'DATE'), (u'Sep', u'DATE'), (u'2.30pm', u'DATE'), (u'1976', u'DATE')]