Python 检测带有指定单词的POS标记模式

Python 检测带有指定单词的POS标记模式,python,nlp,nltk,pos-tagger,Python,Nlp,Nltk,Pos Tagger,我需要在某些特定单词之前/之后识别某些POS标记,例如以下标记句子: [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')] 可以抽象为形式“将是”+形容词 同样地: [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('

我需要在某些特定单词之前/之后识别某些POS标记,例如以下标记句子:

[('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
可以抽象为形式“将是”+形容词

同样地:

[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]
形式为“我能”+动词


我怎样才能在句子中检查这些类型的句型呢。我正在使用NLTK

假设您想逐字检查“将”后接“是”,后接一些形容词,您可以这样做:

def would_be(tagged):
    return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
输入是一个POS标记的句子(元组列表,根据NLTK)

它检查列表中是否有任何三个元素,例如“will”在“be”旁边,“be”在标记为形容词('JJ')的单词旁边。一旦匹配此“模式”,它将返回
True

对于第二类句子,你可以做一些非常类似的事情:

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))
以下是该程序的驱动程序:

s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

def would_be(tagged):
   return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))
这将正确地输出:

Is 'This feature would be nice to have' of type 'would be' + adj? True
Is 'This feature would be nice to have' of type 'am able to' + verb? False
Is 'I am able to delete the group functionality' of type 'would be' + adj? False
Is 'I am able to delete the group functionality' of type 'am able to' + verb? True

如果你想概括这一点,你可以改变你是在检查文字还是他们的词性标签

你说的“检查”是什么意思?我的意思是如何检测句子中是否存在“能够”+动词形式的模式。或者,例如,一个句子中存在类似“will be”+比较形容词的词。因此,如果它存在,您是否要打印
True
,或者?是的,我看到过仅匹配词性的示例,但在我的例子中,我需要匹配单词和词性标记,如果这有意义的话……还要注意,
'JJ'
不是一个比较级形容词,它只是一个形容词。我要做一些一般性的事情,比如am_________________________(s1),我得到一个列表索引超出范围的错误。除此之外,它还能工作。谢谢更正了函数。谢谢Erip。我测试了另一句话“我可以删除组功能”,但仍然得到一个列表索引超出范围的错误。@newdev14该句的POS标记列表是什么?我在这台机器上没有nltk。[('I','PRP'),('am','VBP'),('able','JJ'),('to','to'),('delete','VB'),('the','DT'),('group','NN'),('functionality','NN')]