从字符串中提取带括号的Python表达式_Python_Parsing

从字符串中提取带括号的Python表达式

python parsing

从字符串中提取带括号的Python表达式,python,parsing,Python,Parsing,我一直在想，编写一些Python代码来搜索字符串以查找${expr}形式的子字符串的索引有多难，例如，expr是一个Python表达式或类似的表达式。考虑到这一点，我们可以很容易地想象继续使用compile（）检查表达式的语法，使用eval（）根据特定范围对其求值，甚至可能将结果替换为原始字符串。人们必须一直做非常相似的事情我可以想象使用第三方解析器生成器[oof]，或者手工编写某种状态机[eek]，或者说服Python自己的解析器以某种方式完成繁重的工作[hmm]，来解决这样的问题。也许有一

我一直在想，编写一些Python代码来搜索字符串以查找

${

expr

形式的子字符串的索引有多难，例如，expr是一个Python表达式或类似的表达式。考虑到这一点，我们可以很容易地想象继续使用

compile（）

检查表达式的语法，使用

eval（）

根据特定范围对其求值，甚至可能将结果替换为原始字符串。人们必须一直做非常相似的事情

我可以想象使用第三方解析器生成器[oof]，或者手工编写某种状态机[eek]，或者说服Python自己的解析器以某种方式完成繁重的工作[hmm]，来解决这样的问题。也许有一个第三方模板库可以做到这一点。也许限制expr的语法在简单性、执行时间或减少外部依赖性方面可能是一个值得的折衷方案——例如，也许我真正需要的是匹配任何具有平衡花括号的expr的东西

你有什么感觉

更新：非常感谢您到目前为止的回复！回顾我昨天写的东西，我不确定我是否足够清楚我在问什么。模板替换确实是一个有趣的问题，可能比我想知道的表达式提取子问题对更多的人更有用，但我提出它只是作为一个简单的例子，说明我的问题的答案在现实生活中是如何有用的。其他一些潜在的应用可能包括将提取的表达式传递给语法高亮器；将结果传递给真正的Python解析器，并查看或篡改解析树；或者使用提取的表达式序列来构建一个更大的Python程序，可能需要结合从周围文本中获取的一些信息

我提到的

${

expr

语法也是一个例子，事实上，我想知道我是否不应该用

$（

expr

）

作为我的例子，因为它使得明显方法的潜在缺点，就像

re.finditer（r'$\{（[^}+）\}s）

一样，更容易看到。Python表达式可以（而且经常）包含

）

（或

）字符。似乎处理这些案件可能比它的价值要麻烦得多，但我还不相信这一点。请随时尝试做这个案例

在发布这个问题之前，我花了相当多的时间研究Python模板引擎，希望可以公开我所问的那种低级功能——即，它可以在各种上下文中找到表达式并告诉我它们在哪里，而不是局限于使用单一硬编码语法查找嵌入的表达式、始终对其求值并始终将结果替换回原始字符串。我还没有弄明白如何使用它们来解决我的问题，但我非常感谢关于更多内容的建议（不敢相信我错过了维基上的精彩列表！）。这些东西的API文档往往是相当高级的，我对其中任何一个的内部结构都不太熟悉，因此我相信我可以通过帮助了解这些内容并找出如何让它们完成这类工作

谢谢你的耐心

我认为您的最佳选择是匹配所有大括号条目，然后对照Python本身检查它是否是有效的Python，这会很有帮助。

我想你要问的是能够将Python代码插入到要评估的文本文件中。已有几个模块提供这种功能。您可以查看Python.org以获得全面的列表

一些谷歌搜索还发现了一些你可能感兴趣的其他模块：

（py模板项目的一部分）

如果您真的想自己写这篇文章，不管出于什么原因，您也可以深入研究Python食谱解决方案：

“模板化”（将输入文件复制到输出，动态插入Python 表达式和语句）是一种常见的需求，而YAPTU是一个小而复杂的应用程序完成Python模块；表达式和语句被识别由任意用户选择的正则表达式

编辑：为了见鬼，我为此设计了一个非常简单的代码示例。我肯定它有缺陷，但它至少说明了这个概念的简化版本：

#!/usr/bin/env python

import sys
import re

FILE = sys.argv[1]

handle = open(FILE)
fcontent = handle.read()
handle.close()

for myexpr in re.finditer(r'\${([^}]+)}', fcontent, re.M|re.S):
    text = myexpr.group(1)
    try:
        exec text
    except SyntaxError:
        print "ERROR: unable to compile expression '%s'" % (text)

根据以下文本进行测试：

This is some random text, with embedded python like 
${print "foo"} and some bogus python like

${any:thing}.

And a multiline statement, just for kicks: 

${
def multiline_stmt(foo):
  print foo

multiline_stmt("ahem")
}

More text here.

输出：

[user@host]$ ./exec_embedded_python.py test.txt
foo
ERROR: unable to compile expression 'any:thing'
ahem

我认为您要问的是能够将Python代码插入到要评估的文本文件中。已有几个模块提供这种功能。您可以查看Python.org以获得全面的列表

一些谷歌搜索还发现了一些你可能感兴趣的其他模块：

（py模板项目的一部分）

如果您真的想自己写这篇文章，不管出于什么原因，您也可以深入研究Python食谱解决方案：

编辑：为了见鬼，我为此设计了一个非常简单的代码示例。我肯定它有bug，但它说明了一个错误

def findExpr(s, i0=0, begin='${', end='}', compArgs=('<string>', 'eval')):
  assert '\n' not in s, 'line numbers not implemented'
  i0 = s.index(begin, i0) + len(begin)
  i1 = s.index(end, i0)
  code = errMsg = None
  while code is None and errMsg is None:
    expr = s[i0:i1]
    try: code = compile(expr, *compArgs)
    except SyntaxError, e:
      i1 = s.find(end, i1 + 1)
      if i1 < 0: errMsg, i1 = e.msg, i0 + e.offset
  return i0, i1, code, errMsg

'''
Search s for a (possibly invalid) Python expression bracketed by begin
and end, which default to '${' and '}'.  Return a 4-tuple.

>>> s = 'foo ${a*b + c*d} bar'
>>> i0, i1, code, errMsg = findExpr(s)
>>> i0, i1, s[i0:i1], errMsg
(6, 15, 'a*b + c*d', None)
>>> ' '.join('%02x' % ord(byte) for byte in code.co_code)
'65 00 00 65 01 00 14 65 02 00 65 03 00 14 17 53'
>>> code.co_names
('a', 'b', 'c', 'd')
>>> eval(code, {'a': 1, 'b': 2, 'c': 3, 'd': 4})
14
>>> eval(code, {'a': 'a', 'b': 2, 'c': 'c', 'd': 4})
'aacccc'
>>> eval(code, {'a': None})
Traceback (most recent call last):
  ...
NameError: name 'b' is not defined

Expressions containing start and/or end are allowed.

>>> s = '{foo ${{"}": "${"}["}"]} bar}'
>>> i0, i1, code, errMsg = findExpr(s)
>>> i0, i1, s[i0:i1], errMsg
(7, 23, '{"}": "${"}["}"]', None)

If the first match is syntactically invalid Python, i0 points to the
start of the match, i1 points to the parse error, code is None and
errMsg contains a message from the compiler.

>>> s = '{foo ${qwerty asdf zxcvbnm!!!} ${7} bar}'
>>> i0, i1, code, errMsg = findExpr(s)
>>> i0, i1, s[i0:i1], errMsg
(7, 18, 'qwerty asdf', 'invalid syntax')
>>> print code
None

If a second argument is given, start searching there.

>>> i0, i1, code, errMsg = findExpr(s, i1)
>>> i0, i1, s[i0:i1], errMsg
(33, 34, '7', None)

Raise ValueError if there are no further matches.

>>> i0, i1, code, errMsg = findExpr(s, i1)
Traceback (most recent call last):
  ...
ValueError: substring not found

In ambiguous cases, match the shortest valid expression.  This is not
always ideal behavior.

>>> s = '{foo ${x or {} # return {} instead of None} bar}'
>>> i0, i1, code, errMsg = findExpr(s)
>>> i0, i1, s[i0:i1], errMsg
(7, 25, 'x or {} # return {', None)

This implementation must not be used with multi-line strings.  It does
not adjust line number information in the returned code object, and it
does not take the line number into account when computing the offset
of a parse error.

'''