Python 识别隐式字符串文字连接_Python_String

Python 识别隐式字符串文字连接

python string

Python 识别隐式字符串文字连接,python,string,Python,String,（对于其他一些Python程序员来说），隐式字符串文字连接被认为是有害的。因此，我试图识别包含这种连接的逻辑行我的第一次（也是唯一一次）尝试是使用shlex；我想用posix=False拆分一条逻辑行，因此我将标识用引号封装的部分，如果这些部分彼此相邻，则将被视为“文字连接” 但是，这在多行字符串上失败，如下例所示： shlex.split('""" Some docstring """', posix=False) # Returns '['""', '" Some docstring "'

（对于其他一些Python程序员来说），隐式字符串文字连接被认为是有害的。因此，我试图识别包含这种连接的逻辑行

我的第一次（也是唯一一次）尝试是使用

shlex

；我想用

posix=False

拆分一条逻辑行，因此我将标识用引号封装的部分，如果这些部分彼此相邻，则将被视为“文字连接”

但是，这在多行字符串上失败，如下例所示：

shlex.split('""" Some docstring """', posix=False)
# Returns '['""', '" Some docstring "', '""']', which is considered harmful, but it's not

我可以调整这是一些奇怪的特别方式，但我想知道你是否能想出一个简单的解决方案。我的目的是将它添加到我已经扩展的

pep8

验证器中。

有趣的问题，我只是不得不玩它，因为没有答案，所以我发布了我的问题解决方案：

#!/usr/bin/python

import tokenize
import token
import sys

with open(sys.argv[1], 'rU') as f:
    toks = list(tokenize.generate_tokens(f.readline))
    for i in xrange(len(toks) - 1):
        tok = toks[i]
        # print tok
        tok2 = toks[i + 1]
        if tok[0] == token.STRING and tok[0] == tok2[0]:
            print "implicit concatenation in line " \
                "{} between {} and {}".format(tok[2][0], tok[1], tok2[1])

您可以将程序本身输入，结果应该是

implicit concatenation in line 14 between "implicit concatenation in line " and "{} between {} and {}"

我决定使用user2357112的建议，并对其进行一些扩展，以得出以下解决方案，我在这里将其描述为对

pep8

模块的扩展：

def python_illegal_concetenation(logical_line):
    """
    A language design mistake from the early days of Python.
    https://mail.python.org/pipermail/python-ideas/2013-May/020527.html

    Okay: val = "a" + "b"
    W610: val = "a" "b"
    """
    w = "W610 implicit string literal concatenation considered harmful"
    sio = StringIO.StringIO(logical_line)
    tgen = tokenize.generate_tokens(sio.readline)
    state = None
    for token_type, _, (_, pos), _, _ in tgen:
      if token_type == tokenize.STRING:
        if state == tokenize.STRING:
          yield pos, w
        else:
          state = tokenize.STRING
      else:
        state = None

更好地处理这个问题的一个方法是，当您有一个列表时，在结束引用后放置一个空格（或两个空格）：

aList = [
   'one'  ,
   'two'  ,
   'three'
   'four'  ,
]

现在更明显的是，“三”缺少后面的逗号

建议：我建议python使用一个杂注，表示在某个区域中禁止字符串文字连接：

@nostringliteralconcat
a = "this" "and" "that"   # Would cause a compiler failure
@stringliteralconcat
a = "this" "and" "that"   # Successfully Compiles

默认情况下允许连接（以保持兼容性）

还有一个线程：

使用标记源，并检查标记流？+1中感兴趣的线程。例如，我一直使用隐式字符串连接来继续行，我讨厌在那里抛出一堆难看的

。。。但是，如果发生这种情况，我想我会处理它。标记化是一个好主意。我喜欢我的解决方案，但这个解决方案当然也是有效的。我接受这一点，把我的留给别人参考。pep8的扩展？太好了\o/：）