使用python和语法列表解析文本文件

使用python和语法列表解析文本文件,python,python-3.x,parsing,nlp,text-parsing,Python,Python 3.x,Parsing,Nlp,Text Parsing,我必须做一个语法分析:目标是创建一个语法规则,该规则将应用于语料库中。我有一个问题:是否可以在语法中列出一个列表 例如: 1) Open the text to be analyzed 2) Write the grammatical rules (just an example): grammar(""" S -> NP VP NP -> DET N VP -> V N DET -> list_det.txt N -> list

我必须做一个语法分析:目标是创建一个语法规则,该规则将应用于语料库中。我有一个问题:是否可以在语法中列出一个列表

例如:

1) Open the text to be analyzed
2) Write the grammatical rules (just an example):
   grammar("""
   S -> NP VP
   NP -> DET N
   VP -> V N
   DET -> list_det.txt
   N -> list_n.txt
   V -> list.txt""")
3) Print the result with the entries that obey this grammar

有可能吗?

这是一个快速的语法概念原型,使用pyparsing。我无法从你的问题中判断出
N
V
DET
列表的内容是什么,所以我只是随意选择了由“N”和“V”以及字面“DET”组成的单词。您可以更换
,这应该是可能的。你在怀疑什么?我不知道如何在语法中调用外部列表。我还怀疑这是否可能,因为我们正在讨论词典…是“list_det.txt”、“list_n.txt”和“list.txt”文件名,其内容应包括在其名称当前出现的语法中?det是限定词,一个包含文章、指示词和量词(以及其他内容)的词类。太棒了!!问题是:V DET和N是包含许多条目的列表,所以很难放入程序中。我需要调用这些列表,我不知道这是否可能…这是pyparsing合理的范围-你将很快开始处理动词时态、不规则复数等,然后它变得非常丑陋。在这一点上,你真的在做NLTK的东西,通过谷歌我相信你会找到一些Python NL库。我希望这至少表明你的语法是可行的。
import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

V = pp.Group(pp.OneOrMore(v))
N = pp.Group(pp.OneOrMore(n))
DET = pp.Group(pp.OneOrMore(det))

VP = pp.Group(V + N)
NP = pp.Group(DET + N)
S = NP + VP

# replace these with something meaningful
v <<= pp.Word('v')
n <<= pp.Word('n')
det <<= pp.Literal('det')

sample = 'det det nn nn nn nn vv vv vv nn nn nn nn'

parsed = S.parseString(sample)
print(parsed.asList())
[[['det', 'det'], ['nn', 'nn', 'nn', 'nn']], 
 [['vv', 'vv', 'vv'], ['nn', 'nn', 'nn', 'nn']]]
import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

def collectionOf(expr):
    '''
    Compose a collection expression for a base expression that matches
        expr
        expr and expr
        expr, expr, expr, and expr
    '''
    AND = pp.Literal('and')
    OR = pp.Literal('or')
    COMMA = pp.Suppress(',')
    return expr + pp.Optional(
            pp.Optional(pp.OneOrMore(COMMA + expr) + COMMA) + (AND | OR) + expr)

V = pp.Group(collectionOf(v))('V')
N = pp.Group(collectionOf(n))('N')
DET = pp.Group(pp.OneOrMore(det))('DET')

VP = pp.Group(V + N)('VP')
NP = pp.Group(DET + N)('NP')
S = pp.Group(NP + VP)('S')

# replace these with something meaningful
v <<= pp.Combine(pp.oneOf('chase love hate like eat drink') + pp.Optional(pp.Literal('s')))
n <<= pp.Optional(pp.oneOf('the a my your our his her their')) + pp.oneOf("dog cat horse rabbit squirrel food water")
det <<= pp.Optional(pp.oneOf('why how when where')) +pp.oneOf( 'do does did')

samples = '''
    when does the dog eat the food
    does the dog like the cat
    do the horse, cat, and dog like or hate their food
    do the horse and dog love the cat
    why did the dog chase the squirrel
'''
S.runTests(samples)
when does the dog eat the food
[[[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]]
- S: [[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]
  - NP: [['when', 'does'], ['the', 'dog']]
    - DET: ['when', 'does']
    - N: ['the', 'dog']
  - VP: [['eat'], ['the', 'food']]
    - N: ['the', 'food']
    - V: ['eat']


does the dog like the cat
[[[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]]
- S: [[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]
  - NP: [['does'], ['the', 'dog']]
    - DET: ['does']
    - N: ['the', 'dog']
  - VP: [['like'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['like']


do the horse, cat, and dog like or hate their food
[[[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]]
- S: [[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]
  - NP: [['do'], ['the', 'horse', 'cat', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'cat', 'and', 'dog']
  - VP: [['like', 'or', 'hate'], ['their', 'food']]
    - N: ['their', 'food']
    - V: ['like', 'or', 'hate']


do the horse and dog love the cat
[[[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]]
- S: [[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]
  - NP: [['do'], ['the', 'horse', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'and', 'dog']
  - VP: [['love'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['love']


why did the dog chase the squirrel
[[[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]]
- S: [[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]
  - NP: [['why', 'did'], ['the', 'dog']]
    - DET: ['why', 'did']
    - N: ['the', 'dog']
  - VP: [['chase'], ['the', 'squirrel']]
    - N: ['the', 'squirrel']
    - V: ['chase']