Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用pyparsing分析多行上的单词转义拆分_Python_Parsing_Pyparsing - Fatal编程技术网

Python 使用pyparsing分析多行上的单词转义拆分

Python 使用pyparsing分析多行上的单词转义拆分,python,parsing,pyparsing,Python,Parsing,Pyparsing,我正在尝试使用反斜杠换行组合(“\\n”)解析可以拆分为多行的单词。以下是我所做的: from pyparsing import * continued_ending = Literal('\\') + lineEnd word = Word(alphas) split_word = word + Suppress(continued_ending) multi_line_word = Forward() multi_line_word << (word | (split_word

我正在尝试使用反斜杠换行组合(“
\\n
”)解析可以拆分为多行的单词。以下是我所做的:

from pyparsing import *

continued_ending = Literal('\\') + lineEnd
word = Word(alphas)
split_word = word + Suppress(continued_ending)
multi_line_word = Forward()
multi_line_word << (word | (split_word + multi_line_word))

print multi_line_word.parseString(
'''super\\
cali\\
fragi\\
listic''')

我又摸索了一会儿,才发现这里有一个值得注意的地方

我经常看到低效的语法 有人实现了pyparsing语法 直接来自BNF定义。BNF 没有“一个或多个”的概念 “更多”或“零或更多”或 “可选”

有了这些,我有了改变这两行的想法

multi_line_word = Forward()
multi_line_word << (word | (split_word + multi_line_word))
这让它输出了我想要的东西:
['super','cali',fragi','listic']

接下来,我添加了一个将这些令牌连接在一起的解析操作:

multi_line_word.setParseAction(lambda t: ''.join(t))
这将给出
['supercalifragilistic']
的最终输出

我学到的带回家的信息是,一个人并不简单

只是开玩笑

带回家的信息是,不能简单地用pyparsing实现BNF的一对一翻译。应该调用使用迭代类型的一些技巧

编辑2009-11-25:为了补偿更繁重的测试用例,我将代码修改为:

no_space = NotAny(White(' \t\r'))
# make sure that the EOL immediately follows the escape backslash
continued_ending = Literal('\\') + no_space + lineEnd
word = Word(alphas)
# make sure that the escape backslash immediately follows the word
split_word = word + NotAny(White()) + Suppress(continued_ending)
multi_line_word = OneOrMore(split_word + NotAny(White())) + Optional(word)
multi_line_word.setParseAction(lambda t: ''.join(t))

这样做的好处是确保任何元素之间都没有空格(转义反斜杠后的换行除外)。

您的代码非常接近。这些MOD中的任何一个都可以工作:

# '|' means MatchFirst, so you had a left-recursive expression
# reversing the order of the alternatives makes this work
multi_line_word << ((split_word + multi_line_word) | word)

# '^' means Or/MatchLongest, but beware using this inside a Forward
multi_line_word << (word ^ (split_word + multi_line_word))

# an unusual use of delimitedList, but it works
multi_line_word = delimitedList(word, continued_ending)

# in place of your parse action, you can wrap in a Combine
multi_line_word = Combine(delimitedList(word, continued_ending))
#“|”表示匹配优先,因此您有一个左递归表达式
#颠倒备选方案的顺序可以实现这一点

多行字使用
Combine
也不强制执行中间空白。有趣。尝试了
multi\u-line\u-word=Combine(Combine(OneOrMore(split\u-word))+Optional(word))
,但它在
'sh\\\\n iny'
案例中中断,因为它不会引发异常,而是返回
['sh']
。我遗漏了什么吗?嗯,你的单词不仅仅是跨越“\”新行的字母,而是在字母“I”之前有一个空格,可以算作分词,所以Combine在“sh”之后停止。您可以使用nexting=False构造函数参数修改Combine,但请注意,您可能会将整个文件作为一个单词来使用!或者,如果您还想折叠任何前导空格,可以重新定义continued\u ending的定义,以包括行尾后的任何空格。我更喜欢
多行词.parseString('sh\\\n iny')
raise
ParseException
,而不是将
'sh'
标识为其标记。在这种情况下,
'sh'
'iny'
是两个单词,而不是一个断字的一部分,因为
'iny'
部分与EOL不连续。因此,
多行词
不应该识别它。它应该举手说:“这不是一个有效的断字!”
no_space = NotAny(White(' \t\r'))
# make sure that the EOL immediately follows the escape backslash
continued_ending = Literal('\\') + no_space + lineEnd
word = Word(alphas)
# make sure that the escape backslash immediately follows the word
split_word = word + NotAny(White()) + Suppress(continued_ending)
multi_line_word = OneOrMore(split_word + NotAny(White())) + Optional(word)
multi_line_word.setParseAction(lambda t: ''.join(t))
# '|' means MatchFirst, so you had a left-recursive expression
# reversing the order of the alternatives makes this work
multi_line_word << ((split_word + multi_line_word) | word)

# '^' means Or/MatchLongest, but beware using this inside a Forward
multi_line_word << (word ^ (split_word + multi_line_word))

# an unusual use of delimitedList, but it works
multi_line_word = delimitedList(word, continued_ending)

# in place of your parse action, you can wrap in a Combine
multi_line_word = Combine(delimitedList(word, continued_ending))