Python 为什么NLTK Lemmatizer可以';Don’不要把一些复数词套用词法?

Python 为什么NLTK Lemmatizer可以';Don’不要把一些复数词套用词法?,python,nltk,wordnet,lemmatization,plural,Python,Nltk,Wordnet,Lemmatization,Plural,我试着将古兰经圣书中的一个词进行柠檬化,但有些词无法柠檬化 这是我的一句话: sentence = "Then bring ten surahs like it that have been invented and call upon for assistance whomever you can besides Allah if you should be truthful" 这句话是我的txt数据集的一部分。 正如你所看到的,有“surahs”,它是“surah”的复数形式。 我试过我的

我试着将古兰经圣书中的一个词进行柠檬化,但有些词无法柠檬化

这是我的一句话:

sentence = "Then bring ten surahs like it that have been invented and call upon for assistance whomever you can besides Allah if you should be truthful"
这句话是我的txt数据集的一部分。 正如你所看到的,有“surahs”,它是“surah”的复数形式。 我试过我的代码:

def lemmatize(self, ayat):
    wordnet_lemmatizer = WordNetLemmatizer()
    result = []

    for i in xrange (len(ayat)):
        result.append(wordnet_lemmatizer.lemmatize(sentence[i],'v'))
    return result
当我运行并打印时,结果如下:

['bring', 'ten', 'surahs', 'like', u'invent', 'call', 'upon', 'assistance', 'whomever', 'besides', 'Allah', 'truthful']
“surahs”没有改成“surah”

谁都知道为什么?谢谢。

请参阅

对于大多数非标准英语单词,WordNet Lemmatizer对获得正确的引理没有多大帮助,请尝试词干分析器:

>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()
>>> porter.stem('surahs')
u'surah'

另外,在(一个
nltk
包装器,“无耻的插头”)中尝试发送的
lemmatize\u


wordnetlemmatizer本身并没有什么问题,但它不能很好地处理不规则的单词。你可以试试这个‘黑客’——我也试过那个黑客,但没有结果[]哇,谢谢。这很酷。但什么是“泥土”模块,我从哪里可以得到它?我不能叫“earthy”,模块名称未定义。
pip安装-U earthy
wow酷,谢谢,我已经安装了。有关于earthy library的书籍或教程吗?有,但如果您想要更严肃的工具,请尝试
spacy
>>> from earthy.nltk_wrappers import lemmatize_sent
>>> sentence = "Then bring ten surahs like it that have been invented and call upon for assistance whomever you can besides Allah if you should be truthful"
>>> lemmatize_sent(sentence)
[('Then', 'Then', 'RB'), ('bring', 'bring', 'VBG'), ('ten', 'ten', 'RP'), ('surahs', 'surahs', 'NNS'), ('like', 'like', 'IN'), ('it', 'it', 'PRP'), ('that', 'that', 'WDT'), ('have', 'have', 'VBP'), ('been', u'be', 'VBN'), ('invented', u'invent', 'VBN'), ('and', 'and', 'CC'), ('call', 'call', 'VB'), ('upon', 'upon', 'NN'), ('for', 'for', 'IN'), ('assistance', 'assistance', 'NN'), ('whomever', 'whomever', 'NN'), ('you', 'you', 'PRP'), ('can', 'can', 'MD'), ('besides', 'besides', 'VB'), ('Allah', 'Allah', 'NNP'), ('if', 'if', 'IN'), ('you', 'you', 'PRP'), ('should', 'should', 'MD'), ('be', 'be', 'VB'), ('truthful', 'truthful', 'JJ')]

>>> words, lemmas, tags = zip(*lemmatize_sent(sentence))
>>> lemmas
('Then', 'bring', 'ten', 'surahs', 'like', 'it', 'that', 'have', u'be', u'invent', 'and', 'call', 'upon', 'for', 'assistance', 'whomever', 'you', 'can', 'besides', 'Allah', 'if', 'you', 'should', 'be', 'truthful')

>>> from earthy.nltk_wrappers import pywsd_lemmatize
>>> pywsd_lemmatize('surahs')
'surahs'

>>> from earthy.nltk_wrappers import porter_stem
>>> porter_stem('surahs')
u'surah'