如何检查字符串是否是有效的python标识符？包括关键字检查吗？_Python_Keyword_Identifier_Reserved

如何检查字符串是否是有效的python标识符？包括关键字检查吗？

python

如何检查字符串是否是有效的python标识符？包括关键字检查吗？,python,keyword,identifier,reserved,Python,Keyword,Identifier,Reserved,是否有人知道是否有任何内置python方法可以检查某个东西是否是有效的python变量名，包括对保留关键字的检查？（因此，例如，像“in”或“for”之类的东西会失败……）如果做不到这一点，是否有人知道我可以从哪里获得保留关键字列表（即，动态地，从python内部，而不是从在线文档中复制和粘贴某些内容）？或者，有没有另一种好的方式来写你自己的支票令人惊讶的是，在try/except中封装setattr的测试不起作用，如下所示： setattr(myObj, 'My Sweet Name!',

是否有人知道是否有任何内置python方法可以检查某个东西是否是有效的python变量名，包括对保留关键字的检查？（因此，例如，像“in”或“for”之类的东西会失败……）

如果做不到这一点，是否有人知道我可以从哪里获得保留关键字列表（即，动态地，从python内部，而不是从在线文档中复制和粘贴某些内容）？或者，有没有另一种好的方式来写你自己的支票

令人惊讶的是，在try/except中封装setattr的测试不起作用，如下所示：

setattr(myObj, 'My Sweet Name!', 23)

import keyword
import tokenize

def isidentifier_py3(ident):
    """Determines if string is valid Python identifier."""

    # Smoke test — if it's not string, then it's not identifier, but we don't
    # want to just silence exception. It's better to fail fast.
    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    # Quick test — if string is in keyword list, it's definitely not an ident.
    if keyword.iskeyword(ident):
        return False

    readline = lambda g=(lambda: (yield ident.encode('utf-8-sig')))(): next(g)
    tokens = list(tokenize.tokenize(readline))

    # You should get exactly 3 tokens
    if len(tokens) != 3:
        return False

    # If using Python 3, first one is ENCODING, it's always utf-8 because 
    # we explicitly passed in UTF-8 BOM with ident.
    if tokens[0].type != tokenize.ENCODING:
        return False

    # Second is NAME, identifier.
    if tokens[1].type != tokenize.NAME:
        return False

    # Name should span all the string, so there would be no whitespace.
    if ident != tokens[1].string:
        return False

    # Third is ENDMARKER, ending stream
    if tokens[2].type != tokenize.ENDMARKER:
        return False

    return True

…真的管用！（…甚至可以用getattr检索！）

python关键字列表很短，因此您只需使用简单的正则表达式检查语法，并在相对较小的关键字列表中检查成员身份

import keyword #thanks asmeurer
import re
my_var = "$testBadVar"
print re.match("[_A-Za-z][_a-zA-Z0-9]*",my_var) and not keyword.iskeyword(my_var)

一个更短但更危险的选择是

my_bad_var="%#ASD"
try:exec("{0}=1".format(my_bad_var))
except SyntaxError: #this maynot be right error
   print "Invalid variable name!"

最后是稍微安全一点的变种

my_bad_var="%#ASD"

try:
  cc = compile("{0}=1".format(my_bad_var),"asd","single")
  eval(cc)
  print "VALID"
 except SyntaxError: #maybe different error
  print "INVALID!"

关键字

模块包含所有保留关键字的列表：

>>> import keyword
>>> keyword.iskeyword("in")
True
>>> keyword.kwlist
['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try', 'while', 'with', 'yield']

请注意，此列表将根据您使用的Python的主要版本而有所不同，因为关键字列表会发生变化（尤其是在Python 2和Python 3之间）

如果您还需要所有内置名称，请使用

\uuuuuuuuuuuu

>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__build_class__', '__debug__', '__doc__', '__import__', '__name__', '__package__', 'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']

请注意，其中一些（如

版权

）实际上并没有那么大的问题需要覆盖

还有一个警告：请注意，在Python 2中，

True

、

False

和

None

不被视为关键字。但是，分配给

None

是一个语法错误。允许分配到

True

或

False

，但不建议这样做（与任何其他内置项相同）。在Python 3中，它们是关键字，所以这不是问题。

John:作为一个小小的改进，我在re中添加了$，否则，测试不会检测到空格：

import keyword 
import re
my_var = "$testBadVar"
print re.match("[_A-Za-z][_a-zA-Z0-9]*$",my_var) and not keyword.iskeyword(my_var)

Python 3 Python3现在有了

'foo'.isidentifier（）

，因此这似乎是最新Python版本的最佳解决方案（谢谢）runciter@freenode建议）。但是，与直觉相反，它不会对照关键字列表进行检查，因此必须使用两者的组合：

import keyword

def isidentifier(ident: str) -> bool:
    """Determines if string is valid Python identifier."""

    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    if not ident.isidentifier():
        return False

    if keyword.iskeyword(ident):
        return False

    return True

Python 2 对于Python2，检查给定字符串是否为有效的Python标识符的最简单方法是让Python自己解析它

有两种可能的方法。最快的方法是使用

ast

，并检查单个表达式的ast是否具有所需的形状：

import ast

def isidentifier(ident):
    """Determines, if string is valid Python identifier."""

    # Smoke test — if it's not string, then it's not identifier, but we don't
    # want to just silence exception. It's better to fail fast.
    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    # Resulting AST of simple identifier is <Module [<Expr <Name "foo">>]>
    try:
        root = ast.parse(ident)
    except SyntaxError:
        return False

    if not isinstance(root, ast.Module):
        return False

    if len(root.body) != 1:
        return False

    if not isinstance(root.body[0], ast.Expr):
        return False

    if not isinstance(root.body[0].value, ast.Name):
        return False

    if root.body[0].value.id != ident:
        return False

    return True

相同的函数，但与Python 3兼容，如下所示：

setattr(myObj, 'My Sweet Name!', 23)

import keyword
import tokenize

def isidentifier_py3(ident):
    """Determines if string is valid Python identifier."""

    # Smoke test — if it's not string, then it's not identifier, but we don't
    # want to just silence exception. It's better to fail fast.
    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    # Quick test — if string is in keyword list, it's definitely not an ident.
    if keyword.iskeyword(ident):
        return False

    readline = lambda g=(lambda: (yield ident.encode('utf-8-sig')))(): next(g)
    tokens = list(tokenize.tokenize(readline))

    # You should get exactly 3 tokens
    if len(tokens) != 3:
        return False

    # If using Python 3, first one is ENCODING, it's always utf-8 because 
    # we explicitly passed in UTF-8 BOM with ident.
    if tokens[0].type != tokenize.ENCODING:
        return False

    # Second is NAME, identifier.
    if tokens[1].type != tokenize.NAME:
        return False

    # Name should span all the string, so there would be no whitespace.
    if ident != tokens[1].string:
        return False

    # Third is ENDMARKER, ending stream
    if tokens[2].type != tokenize.ENDMARKER:
        return False

    return True

但是，请注意Python3

tokenize

实现中的错误，这些错误会拒绝一些完全有效的标识符，如

℘᧚，ﮯ和贈ᩭ<代码>ast
工作正常。通常，我建议不要在实际检查中使用基于标记化的实现
<>也有人认为重型机械如AST解析器是一种TAD过度杀伤。这个简单的实现是自包含的，保证可以在任何Python 2上工作：
import keyword
import string

def isidentifier(ident):
    """Determines if string is valid Python identifier."""

    if not isinstance(ident, str):
        raise TypeError("expected str, but got {!r}".format(type(ident)))

    if not ident:
        return False

    if keyword.iskeyword(ident):
        return False

    first = '_' + string.lowercase + string.uppercase
    if ident[0] not in first:
        return False

    other = first + string.digits
    for ch in ident[1:]:
        if ch not in other:
            return False

    return True

以下是检查这些所有工作的一些测试：
assert(isidentifier('foo'))
assert(isidentifier('foo1_23'))
assert(not isidentifier('pass'))    # syntactically correct keyword
assert(not isidentifier('foo '))    # trailing whitespace
assert(not isidentifier(' foo'))    # leading whitespace
assert(not isidentifier('1234'))    # number
assert(not isidentifier('1234abc')) # number and letters
assert(not isidentifier('I needed to check for Python 3 identifiers from Python 2 code.  I used a regex based on the docs:
import keyword
import regex


def is_py3_identifier(ident):
    """Checks that ident is a valid Python 3 identifier according to
    https://docs.python.org/3/reference/lexical_analysis.html#identifiers
    """
    return bool(
        ID_REGEX.match(unicodedata.normalize('NFKC', ident)) and
        not PY3_KEYWORDS.contains(ident))

# See https://docs.python.org/3/reference/lexical_analysis.html#identifiers
ID_START_REGEX = (
    r'\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}'
    r'_\u1885-\u1886\u2118\u212E\u309B-\u309C')
ID_CONTINUE_REGEX = ID_START_REGEX + (
    r'\p{Mn}\p{Mc}\p{Nd}\p{Pc}'
    r'\u00B7\u0387\u1369-\u1371\u19DA')
ID_REGEX = regex.compile(
    "[%s][%s]*$" % (ID_START_REGEX, ID_CONTINUE_REGEX), regex.UNICODE)


PY3_KEYWORDS = frozenset('False', 'None', 'True']).union(keyword.kwlist)

assert（isidentifier（'foo'））
断言（isidentifier（'foo1_23'））
assert（非isidentifier（'pass'））#语法正确的关键字
断言（非isidentifier（'foo'））#尾随空格
断言（不是isidentifier（'foo'））#前导空格
断言（非isidentifier（'1234'））#编号
断言（非isidentifier（'1234abc'））#数字和字母
assert（不是isidentifier（“我需要从Python 2代码中检查Python 3标识符。我使用了一个基于以下内容的正则表达式：
import关键字
导入正则表达式
def是_py3_标识符（ident）：
“”“根据检查标识符是否为有效的Python 3标识符
https://docs.python.org/3/reference/lexical_analysis.html#identifiers
"""
返回布尔(
ID_REGEX.match（unicodedata.normalize（'NFKC'，ident'）和
非PY3_关键字。包含（标识））
#看https://docs.python.org/3/reference/lexical_analysis.html#identifiers
ID\u开始\u正则表达式=(
r'\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}'
r“u1885-\u1886\u2118\u212E\u309B-\u309C”）
ID\u CONTINUE\u REGEX=ID\u START\u REGEX+(
r'\p{Mn}\p{Mc}\p{Nd}\p{Pc}'
r'\u00B7\u0387\u1369-\u1371\u19DA'）
ID_REGEX=REGEX.compile(
“[%s][%s]*$”%（ID\u START\u REGEX，ID\u CONTINUE\u REGEX），REGEX.UNICODE）
PY3_KEYWORDS=frozenset（'False'，'None'，'True']）。union（keyword.kwlist）

注意：这将使用regex
包，而不是内置的re
包来匹配unicode类别。另外：这将拒绝非本地的
，这是Python 2中的一个关键字，而不是Python 3。
不要使用预先确定的关键字集。这在Python版本之间是不可移植的，因为关键字列表更改。有一些替代方法不…但您的方法更好：P（现在已修复）正则表达式不应该在第二部分中也包含\uuu
吗？此外，您有0-0
而不是0-9
。感谢您两方面的支持：L…已修复（顺便说一句，我想如果你问谷歌，它会为你从python规范中找到一个确切的正则表达式，或者至少是一个定义）正则表达式的末尾应该包括$
。否则，只要它以有效标识符开头，它就会匹配。这是因为setattr
和getattr
只是从对象的\uu dict\uuuuuuuuu
中删除。上面的列表不包括内置对象/类型的名称，因此它不会捕获其他常见错误（比如将一个文本变量命名为“str”或一个列表命名为“list”）。除了使用帮助（内置）之外，我不知道如何通过编程检索这些变量的列表在交互式python命令行中。我添加了如何获取所有内置名称。您可以将其用作变量名，但对内置函数或变量类型进行阴影处理通常不是一个好主意；它可能会干扰合法使用。（）我还在Python2中添加了一个关于None
的警告。它不被认为是一个关键字，但分配给它的是一个SyntaxError