Python 使用CFG解析枚举_Python_Context Free Grammar_Pyparsing

Python 使用CFG解析枚举

python

Python 使用CFG解析枚举,python,context-free-grammar,pyparsing,Python,Context Free Grammar,Pyparsing,我有一个从富文本枚举生成的字符串，例如： {"Let X denote one of the following:" : {"weight":{}, "height":{}, "depth":{}} , "Y denote": {"color, except": {"white":{}, "blue":{}}}, "pressure":{} } “（1）让X表示以下其中一项：（a）重量（b）高度（c）深度（2）Y表示（a）颜色，但（i）白色（ii）蓝色（b）压力除外” 我想构建原始结构，例

我有一个从富文本枚举生成的字符串，例如：

{"Let X denote one of the following:" : {"weight":{}, "height":{}, "depth":{}} , 
"Y denote": {"color, except": {"white":{}, "blue":{}}}, "pressure":{} }

“（1）让X表示以下其中一项：（a）重量（b）高度（c）深度（2）Y表示（a）颜色，但（i）白色（ii）蓝色（b）压力除外”

我想构建原始结构，例如：

{"Let X denote one of the following:" : {"weight":{}, "height":{}, "depth":{}} , 
"Y denote": {"color, except": {"white":{}, "blue":{}}}, "pressure":{} }

很明显，这是上下文无关语法，但是我在实现它时遇到了困难

pyparsing

编辑我不是CFG方面的专家，因此我希望BNF的表述是正确的：

假设：

相当于任何字符（

re.compile（“\w*”）

）

相当于

re.compile（“[a-z]”）

相当于're.compile（“\d+”）

相当于罗马数字（

，

ii

，

iii

，…）

那么（希望如此），BNF应该是这样的

<E1>::= "(" <d> ")" | <E1> " "
<E2>::= "(" <l> ")" | <E2> " "
<E3>::= "(" <r> ")" | <E3> " "
<L0>::= <w> | <w> <E1> <L1> <L0>
<L1>::= <w> | <w> <E2> <L2> <L1>
<L2>::= <w> | <w> <E3> <L2>

：=”（“”）|“
::= "("  ")" |  " "
::= "("  ")" |  " "
::=  |    
::=  |    
::=  |

以下是使用pyparsing表达式的解析器的第一部分：

import pyparsing as pp

LPAR, RPAR = map(pp.Suppress, "()")
COMMA, COLON = map(pp.Literal, ",:")
wd = pp.Word(pp.alphas)
letter = pp.oneOf(list(pp.alphas.lower())) 
integer = pp.pyparsing_common.integer
roman = pp.Word('ivx')

e1 = LPAR + integer + RPAR
e2 = LPAR + letter + RPAR
e3 = LPAR + roman + RPAR

根据您的BNF，下一部分可能如下所示：

# predefine levels using Forwards, since they are recursive
lev0 = pp.Forward()
lev1 = pp.Forward()
lev2 = pp.Forward()

lev0 <<= wd | wd + e1 + lev1 + lev0
lev1 <<= wd | wd + e2 + lev2 + lev1
lev2 <<= wd | wd + e3 + lev2

在pyparsing中，您可以更直接地实现这一点：

wd = pp.Word(pp.alphas)
list_of_wd = pp.OneOrMore(wd)
# or using tuple multiplication short-hand
list_of_wd = wd * (1,)

根据您的示例，我将您的BNF级别改写为：

wds = pp.Group(wd*(1,))
lev0 <<= e1 + wds + lev1*(0,)
lev1 <<= e2 + wds + lev2*(0,)
lev2 <<= e3 + wds
expr = lev0()*(1,)
expr.ignore(COMMA | COLON)

我们得到：

(1) Y denote (a) color (b) pressure
[1, ['Y', 'denote'], 'a', ['color'], 'b', ['pressure']]

(1) Let X denote one of the following: (a) weight (b) height (c) depth (2) Y denote (a) color, except (i) white (ii) blue (b) pressure
[1,
 ['Let', 'X', 'denote', 'one', 'of', 'the', 'following'],
 'a',
 ['weight'],
 'b',
 ['height'],
 'c',
 ['depth'],
 2,
 ['Y', 'denote'],
 'a',
 ['color', 'except'],
 'i',
 ['white'],
 'ii',
 ['blue'],
 'b',
 ['pressure']]

所以它被解析了，从某种意义上说，它通过了整个输入字符串，但我们所做的只是基本的标记化，并没有表示整数/阿尔法/罗马嵌套列表所隐含的任何结构

Pyparsing包含一个分组类来构造结果：

G = pp.Group
wds = G(wd*(1,))
lev0 <<= G(e1 + G(wds + lev1*(0,)))
lev1 <<= G(e2 + G(wds + lev2*(0,)))
lev2 <<= G(e3 + wds)
expr = lev0()*(1,)
expr.ignore(COMMA | COLON)

一个完整的解析器实际上会理解“下面的一个”与“下面的所有”以及元素的包含和排除的概念，但这超出了这个问题的范围。

在开始实现任何东西之前，无论是在pyparsing、PLY还是任何其他解析库中，请为表达式编写一个BNF。然后编写一系列从简单到复杂的测试字符串。（从测试字符串开始，然后根据从编写字符串中学到的知识来执行BNF可能更简单。）对照它们测试BNF，并根据需要进行调整。然后开始在pyparsing中实现BNF或者其他任何东西。编辑您的问题以包括BNF，以及您在哪里与pyparsing实现斗争。您好@PaulMcG，我已经编辑了这个问题，以我对

BNF

的最佳理解，请让我知道这是否适用于pyparsing什么表达式应该与您的整体表达式相匹配？没有一个

Ln

以

E1

开头，这就是测试字符串的开头。另外，省略空格，pyparsing默认跳过空格，所以只需关注可打印的内容。测试字符串还包括冒号和逗号，它们都不包含在BNF中。最后，请列出一系列示例测试字符串，从简单到复杂，并根据您的BNF手动测试它们。

G = pp.Group
wds = G(wd*(1,))
lev0 <<= G(e1 + G(wds + lev1*(0,)))
lev1 <<= G(e2 + G(wds + lev2*(0,)))
lev2 <<= G(e3 + wds)
expr = lev0()*(1,)
expr.ignore(COMMA | COLON)

(1) Y denote (a) color (b) pressure
[[1, [['Y', 'denote'], ['a', [['color']]], ['b', [['pressure']]]]]]

(1) Let X denote one of the following: (a) weight (b) height (c) depth (2) Y denote (a) color, except (i) white (ii) blue (b) pressure
[[1,
  [['Let', 'X', 'denote', 'one', 'of', 'the', 'following'],
   ['a', [['weight']]],
   ['b', [['height']]],
   ['c', [['depth']]]]],
 [2,
  [['Y', 'denote'],
   ['a', [['color', 'except'], ['i', ['white']], ['ii', ['blue']]]],
   ['b', [['pressure']]]]]]