Python NLTK中PCFG的生成_Python_Nltk_Grammar

Python NLTK中PCFG的生成

python

Python NLTK中PCFG的生成,python,nltk,grammar,Python,Nltk,Grammar,我试图从包含解析树的文件中学习PCFG，例如： grammar = induce_pcfg(S, productions) productions = [] for item in treebank.items[:2]: for tree in treebank.parsed_sents(item): productions += tree.productions() （名词短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短

我试图从包含解析树的文件中学习PCFG，例如：

grammar = induce_pcfg(S, productions)

productions = []
for item in treebank.items[:2]:
  for tree in treebank.parsed_sents(item):
    productions += tree.productions()

（名词短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语短语（名词）n（飞行）n（飞行）n（准备）IN（准备）pt_PREP_IN （名词名词名词短语（夏洛特·夏洛特））

这是我的相关代码：

def loadData(path):
    with open(path ,'r') as f:
        data = f.read().split('\n')
    return data

def getTreeData(data):
    return map(lambda s: tree.Tree.fromstring(s), data)

# Main script
print("loading data..")
data = loadData('C:\\Users\\Rayyan\\Desktop\\MSc Data\\NLP\\parseTrees.txt')
print("generating trees..")
treeData = getTreeData(data)
print("done!")
print("done!")

此后，我在互联网上尝试了很多东西，例如：

grammar = induce_pcfg(S, productions)

productions = []
for item in treebank.items[:2]:
  for tree in treebank.parsed_sents(item):
    productions += tree.productions()

但在这里，产品始终是内置函数，例如：

grammar = induce_pcfg(S, productions)

productions = []
for item in treebank.items[:2]:
  for tree in treebank.parsed_sents(item):
    productions += tree.productions()

在我的例子中，我尝试用

treeData

替换

production

，但不起作用。我遗漏了什么或做错了什么？

从构建树开始：

from nltk import tree
treeData_rules = []

# Extract the CFG rules (productions) for the sentence
for item in treeData:
    for production in item.productions():
    treeData_rules.append(production)
treeData_rules

然后您可以像这样提取概率CFG（PCFG）：

from nltk import induce_pcfg

S = Nonterminal('S')
grammar_PCFG = induce_pcfg(S, treeData_rules)
print(grammar_PCFG)

从建造树木开始：

from nltk import tree
treeData_rules = []

# Extract the CFG rules (productions) for the sentence
for item in treeData:
    for production in item.productions():
    treeData_rules.append(production)
treeData_rules

然后您可以像这样提取概率CFG（PCFG）：

from nltk import induce_pcfg

S = Nonterminal('S')
grammar_PCFG = induce_pcfg(S, treeData_rules)
print(grammar_PCFG)

谢谢你的解决方案Kate，我看到它有点晚了！希望课程对你来说进展顺利。谢谢你的解决方案Kate，我看到它有点晚了！希望课程对你来说进展顺利。