Python 检测带有指定单词的POS标记模式_Python_Nlp_Nltk_Pos Tagger

Python 检测带有指定单词的POS标记模式

python nlp

Python 检测带有指定单词的POS标记模式,python,nlp,nltk,pos-tagger,Python,Nlp,Nltk,Pos Tagger,我需要在某些特定单词之前/之后识别某些POS标记，例如以下标记句子： [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')] 可以抽象为形式“将是”+形容词同样地： [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('

我需要在某些特定单词之前/之后识别某些POS标记，例如以下标记句子：

[('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]

可以抽象为形式“将是”+形容词

同样地：

[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

形式为“我能”+动词

我怎样才能在句子中检查这些类型的句型呢。我正在使用NLTK

假设您想逐字检查“将”后接“是”，后接一些形容词，您可以这样做：

def would_be(tagged):
    return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

输入是一个POS标记的句子（元组列表，根据NLTK）

它检查列表中是否有任何三个元素，例如“will”在“be”旁边，“be”在标记为形容词（'JJ'）的单词旁边。一旦匹配此“模式”，它将返回

True

对于第二类句子，你可以做一些非常类似的事情：

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

以下是该程序的驱动程序：

s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

def would_be(tagged):
   return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))

这将正确地输出：

Is 'This feature would be nice to have' of type 'would be' + adj? True
Is 'This feature would be nice to have' of type 'am able to' + verb? False
Is 'I am able to delete the group functionality' of type 'would be' + adj? False
Is 'I am able to delete the group functionality' of type 'am able to' + verb? True

如果你想概括这一点，你可以改变你是在检查文字还是他们的词性标签

你说的“检查”是什么意思？我的意思是如何检测句子中是否存在“能够”+动词形式的模式。或者，例如，一个句子中存在类似“will be”+比较形容词的词。因此，如果它存在，您是否要打印

True

，或者？是的，我看到过仅匹配词性的示例，但在我的例子中，我需要匹配单词和词性标记，如果这有意义的话……还要注意，

'JJ'

不是一个比较级形容词，它只是一个形容词。我要做一些一般性的事情，比如am_________________________（s1），我得到一个列表索引超出范围的错误。除此之外，它还能工作。谢谢更正了函数。谢谢Erip。我测试了另一句话“我可以删除组功能”，但仍然得到一个列表索引超出范围的错误。@newdev14该句的POS标记列表是什么？我在这台机器上没有nltk。[（'I'，'PRP'），（'am'，'VBP'），（'able'，'JJ'），（'to'，'to'），（'delete'，'VB'），（'the'，'DT'），（'group'，'NN'），（'functionality'，'NN'）]