Python 除括号外的所有逗号的索引_Python_Regex

Python 除括号外的所有逗号的索引

python regex

Python 除括号外的所有逗号的索引,python,regex,Python,Regex,如何排除小括号（少于20个字符）中的逗号获取此逗号的索引，但（不是此逗号）。获取其他逗号，如，或，或，1,12,2。（不是这个，）但是（如果括号内有超过20个字符，则获取该逗号的索引）所有逗号索引的预期输出： [23,71,76,79,82,87,132] 您可以在一些if语句中使用for循环。这不是一个理想的代码，但可以得到答案。以下是一个例子： textString = 'Get index of this comma, but (not this , comma). Get other

如何排除小括号（少于20个字符）中的逗号

获取此逗号的索引，但（不是此逗号）。获取其他逗号，如，或，或，1,12,2。（不是这个，）但是（如果括号内有超过20个字符，则获取该逗号的索引）

所有逗号索引的预期输出：

[23,71,76,79,82,87,132]

您可以在一些if语句中使用for循环。这不是一个理想的代码，但可以得到答案。以下是一个例子：

textString = 'Get index of this comma, but (not this , comma). Get other commas like , or ,or, 1,1 2 ,2.(not this ,) BUT (get index of this comma, if more than 20 characters are inside the parentheses)'
parFlag = False #flag to check ()
commas = []
lastPar = 0 #last seen ()
for i in range(len(textString)):
    if(textString[i]=='('):
        parFlag = True
        lastPar = i
    if(textString[i]==')' or i-lastPar>=20):
        parFlag = False
    if( textString[i] == ',' and not parFlag):
        commas.append(i)

您可以将for循环与一些if语句一起使用。这不是一个理想的代码，但可以得到答案。以下是一个例子：

textString = 'Get index of this comma, but (not this , comma). Get other commas like , or ,or, 1,1 2 ,2.(not this ,) BUT (get index of this comma, if more than 20 characters are inside the parentheses)'
parFlag = False #flag to check ()
commas = []
lastPar = 0 #last seen ()
for i in range(len(textString)):
    if(textString[i]=='('):
        parFlag = True
        lastPar = i
    if(textString[i]==')' or i-lastPar>=20):
        parFlag = False
    if( textString[i] == ',' and not parFlag):
        commas.append(i)

正则表达式模式：

（，）|（\（[^（）]{0,20}\）

这种模式背后的直觉：

```
（，）
```
查找所有逗号。这些存储在捕获组1中
```
（\（[^（）]{0,20}\）
```
查找所有括号，括号之间最多包含20个字符。这些存储在捕获组2中

然后，我们可以找到组1中的所有匹配项，只排除长度为20的括号内的逗号

现在，要查找这些匹配项的索引，请结合使用和查找组1中每个匹配项的起始索引：

import re

string = """Get index of this comma, but (not this , comma). Get other commas like , or ,or, 1,1 2 ,2.
(not this ,) BUT (get index of this comma, if more than 20 characters are inside the parentheses)"""

indices = [m.start(1) for m in re.finditer('(,)|(\([^()]{0,20}\))', string) if m.group(1)]

print(indices)
# > [23, 71, 76, 79, 82, 87, 132]
print([string[index] for index in indices])
# > [',', ',', ',', ',', ',', ',', ',']

m.start（1）

返回组1匹配的起始索引。由于

re.finditer（）
编辑：这将忽略内部包含20个或更少字符的括号，这与第一条语句不一致，但与示例中解释的内容一致。如果您想要小于20，只需使用{0,19}
正则表达式模式：（，）|（\（[^（）]{0,20}\）

这种模式背后的直觉：

（，）
查找所有逗号。这些存储在捕获组1中

（\（[^（）]{0,20}\）
查找所有括号，括号之间最多包含20个字符。这些存储在捕获组2中


然后，我们可以找到组1中的所有匹配项，只排除长度为20的括号内的逗号
现在，要查找这些匹配项的索引，请结合使用和查找组1中每个匹配项的起始索引：
import re

string = """Get index of this comma, but (not this , comma). Get other commas like , or ,or, 1,1 2 ,2.
(not this ,) BUT (get index of this comma, if more than 20 characters are inside the parentheses)"""

indices = [m.start(1) for m in re.finditer('(,)|(\([^()]{0,20}\))', string) if m.group(1)]

print(indices)
# > [23, 71, 76, 79, 82, 87, 132]
print([string[index] for index in indices])
# > [',', ',', ',', ',', ',', ',', ',']

m.start（1）
返回组1匹配的起始索引。由于re.finditer（）
编辑：这将忽略内部包含20个或更少字符的括号，这与第一条语句不一致，但与示例中解释的内容一致。如果您想要小于20，只需使用{0,19}
使用PyPi正则表达式：
，（？！[^（）]*\）|（？使用PyPi正则表达式：
，（？！[^（）]*\）|（？您还可以使用with来匹配和排除匹配结果中不需要的字符
在这种情况下，可以在不应匹配逗号的括号之间匹配1-20
\([^()]{1,20}\)(*SKIP)(*FAIL)|,

解释
--------------------------------------------------------------------------------
  ,                        ','
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    [^()]*                   any character except: '(', ')' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    \)                       ')'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    \(                       '('
--------------------------------------------------------------------------------
    (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
      [^()]{20}                any character except: '(', ')' (20
                               times)
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    [^()]*                   any character except: '(', ')' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  ,                        ','


\（
匹配（
[^（）]{1,20}
匹配除（
或）以外的任何字符的1-20倍
\）
匹配）
（*跳过）（*失败）
从匹配结果中排除字符
|
或
，
匹配逗号

|
示例代码
import regex

s = """Get index of this comma, but (not this , comma). Get other commas like , or ,or, 1,1 2 ,2.
(not this ,) BUT (get index of this comma, if more than 20 characters are inside the parentheses)"""
pattern = r"\([^()]{1,20}\)(*SKIP)(*FAIL)|,"
indices = [m.start(0) for m in regex.finditer(pattern, s)]
print(indices)

输出
[23, 71, 76, 79, 82, 87, 132]

您还可以使用with来匹配和排除匹配结果中不需要的字符
在这种情况下，可以在不应匹配逗号的括号之间匹配1-20
\([^()]{1,20}\)(*SKIP)(*FAIL)|,

解释
--------------------------------------------------------------------------------
  ,                        ','
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    [^()]*                   any character except: '(', ')' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    \)                       ')'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    \(                       '('
--------------------------------------------------------------------------------
    (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
      [^()]{20}                any character except: '(', ')' (20
                               times)
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    [^()]*                   any character except: '(', ')' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  ,                        ','


\（
匹配（
[^（）]{1,20}
匹配除（
或）以外的任何字符的1-20倍
\）
匹配）
（*跳过）（*失败）
从匹配结果中排除字符
|
或
，
匹配逗号

|
示例代码
import regex

s = """Get index of this comma, but (not this , comma). Get other commas like , or ,or, 1,1 2 ,2.
(not this ,) BUT (get index of this comma, if more than 20 characters are inside the parentheses)"""
pattern = r"\([^()]{1,20}\)(*SKIP)(*FAIL)|,"
indices = [m.start(0) for m in regex.finditer(pattern, s)]
print(indices)

输出
[23, 71, 76, 79, 82, 87, 132]

请你澄清一下你的问题好吗？具体来说，你能提供一个输入示例和你预期的输出吗？我更新了这个问题。谢谢你的帮助。你要求的语法在形式上不是一种常规语言——这就是为什么所有答案都使用不属于“real”的反向引用之类的扩展正则表达式语言。关于真正的正则表达式可以匹配什么和不能匹配，有大量的学术文献——请参阅高级介绍——当您进入扩展时，经典（非常快）算法可能不再有效——这意味着匹配可能会涉及回溯，因此会变得更慢或更占用内存。另请您澄清您的问题？具体来说，您能否提供一个包含预期输出的输入示例？我更新了问题。感谢您的帮助。您请求的语法在形式上不是常规语法语言——这就是为什么所有的答案都使用了扩展，比如不属于“真正的”正则表达式语言的反向引用。有大量学术文献关于真正的正则表达式可以匹配什么和不匹配什么——请参阅高级介绍——当你进入扩展时，经典的（非常快）算法可能不再有效——这意味着匹配可能会涉及回溯，因此会变得更慢或更占用内存。另请参阅始终在限制量词内使用最小阈值，这是最佳做法，如果正则表达式与无法解析缺少的此类量词的正则表达式库一起使用，将有助于避免问题最小值。请始终在限制量词内使用最小阈值，这是最佳实践