Python 从字符串创建两个列表,在括号之间排除和包括字符串

Python 从字符串创建两个列表,在括号之间排除和包括字符串,python,list,pandas,list-comprehension,Python,List,Pandas,List Comprehension,假设我们有一个字符串,如: s = u'apple banana lemmon (hahaha) dog cat whale (hehehe) red blue black' 我要创建以下列表: including = ['hahaha', 'hehehe'] excluding = ['apple banana lemmon (', ') dog cat whale (', ') red blue black'] 第一个列表直接使用正则表达式: including = re.findall

假设我们有一个字符串,如:

s = u'apple banana lemmon (hahaha) dog cat whale (hehehe) red blue black'
我要创建以下列表:

including = ['hahaha', 'hehehe']
excluding = ['apple banana lemmon (', ') dog cat whale (', ') red blue black']
第一个列表直接使用正则表达式:

including = re.findall('\((.*?)\)',s)
但我无法从其他列表中获得类似的内容。你能帮我吗?提前谢谢你

excluding = re.split('|'.join(including), s)
对于一个简单的情况,您知道包含的信息将不包含特殊字符或正则表达式定义

如果您不确定是否会出现这种情况:

re.split('|'.join(map(re.escape, including)), s)

这将转义特殊的正则表达式字符,否则会导致re.split函数的功能紊乱

您可以使用正后向和正前向在括号之间拆分单词:

>>> re.split(r'(?<=\().*?(?=\))', s)
['apple banana lemmon (', ') dog cat whale (', ') red blue black']
使用正则表达式重新拆分(r'(?) 注意空字符串
没有正则表达式
相同的想法,但不覆盖
s

使用包含列表拆分字符串?
re.split(“|”.join(include),s)
最好使用
map(re.escape,include)
否则如果您喜欢
(哈哈\d+haha)
在字符串中,正则表达式将
\d+
解释为一个或多个数字,而不是一个文本
\d+
。这是真的,但我认为它不适用于askee将使用的情况场景(我认为)因为他似乎是在从真实的句子中提取括号信息。那么,我可能是错的,所以Q&A应该不仅仅对最初的提问者有用。因此,有类似问题的人可能需要调用
re.escape
。他事先有
include
吗?是的,这是简单的正则表达式
include=re.findall(“\((.*)\”,s)
这是一个比我的答案更好、更简洁的答案,应该被认为是可以接受的,而不是一个小的澄清,如果可以的话:您的正则表达式假设了已知数量的方括号,使用类似于一个小解析器的东西分成两个列表不是更好吗?
a = re.findall('\)?[^()]*\(?', s)
excluded = a[::2]
included = a[1::2]
print(included, excluded, sep='\n')

['hahaha', 'hehehe', '']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']
a = re.findall('\)?[^()]*\(?', s)
excluded = [*filter(bool, a[::2])]
included = [*filter(bool, a[1::2])]
print(included, excluded, sep='\n')

['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']
from itertools import cycle

def f(s):
  c = cycle('()')
  a = {'(': 1, ')': 0}
  while s:
    d = next(c)
    i = s.find(d)
    if i > -1:
      j = a[d]
      yield d, s[:i + j]
      s = s[i + j:]
    else:
      yield d, s
      break

included = []
excluded = []

for k, v in f(s):
  if k == '(':
    excluded.append(v)
  else:
    included.append(v)

print(included, excluded, sep='\n')

['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']
from itertools import cycle

def f(s):
  c = cycle('()')
  a = {'(': 1, ')': 0}
  j = 0
  while True:
    d = next(c)
    i = s.find(d, j)
    if i > -1:
      k = a[d]
      yield d, s[j:i + k]
      j = i + k
    else:
      yield d, s[j:]
      break

included = []
excluded = []

for k, v in f(s):
  if k == '(':
    excluded.append(v)
  else:
    included.append(v)

print(included, excluded, sep='\n')

['hahaha', 'hehehe']
['apple banana lemmon (', ') dog cat whale (', ') red blue black']