获取嵌套括号中的所有文本(Python)

获取嵌套括号中的所有文本(Python),python,python-3.x,nested,parentheses,Python,Python 3.x,Nested,Parentheses,我正在尝试提取我的.txt文件中嵌套括号中的所有字符串(以及括号本身)。请参阅我在本例中使用的示例.txt文件 我已经尝试并编写了三种不同的代码,但它们似乎都无法提取所有嵌套的括号。它们只能提取嵌套括号的一部分。任何关于我做错了什么的建议都会很有帮助 以下是我迄今为止完成的三个代码: 第一次尝试: 输出: ['"xE\'", PUT(xx.xxxx.),"\'"', '"TRUuuuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vn

我正在尝试提取我的
.txt
文件中嵌套括号中的所有字符串(以及括号本身)。请参阅我在本例中使用的示例
.txt
文件

我已经尝试并编写了三种不同的代码,但它们似乎都无法提取所有嵌套的括号。它们只能提取嵌套括号的一部分。任何关于我做错了什么的建议都会很有帮助

以下是我迄今为止完成的三个代码:

  • 第一次尝试:
输出:

['"xE\'", PUT(xx.xxxx.),"\'"', '"TRUuuuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.", '"xE\'", PUT(xx.xxxx.),"\'"', '"CUuuiiiiuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[['"xE\'"', ',', 'PUT', ['xx.xxxx.'], ',', '"\'"']]
[['"TRUuuuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'lec', '&jgjsd_vnv.']]
[['"xE\'"', ',', 'PUT', ['xx.xxxx.'], ',', '"\'"']]
[['"CUuuiiiiuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'lec', '&jgjsd_vnv.']]
['kkkkk;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from xxxx_x_xx_L ;\n', '"xE\'", PUT(xx.xxxx.),"\'"']
['quit; \n']
['\n']
['/* 1.xxxxx FROM xxxx_x_Ex_x */ \n']
['proc sql; ', ';\n', '"TRUuuuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(\n']
['SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and \n"]
['      ', ')\n', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[' );\n']
['\n']
['\n']
['jjjjjj;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from xxxx_x_xx_L ;\n', '"xE\'", PUT(xx.xxxx.),"\'"']
['quit; \n']
['\n']
['/* 1.xxxxx FROM xxxx_x_Ex_x */ \n']
['proc sql; ', ';\n', '"CUuuiiiiuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and \n"]
['      ', ')\n', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[' );']
  • 第二次尝试:
输出:

['"xE\'", PUT(xx.xxxx.),"\'"', '"TRUuuuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.", '"xE\'", PUT(xx.xxxx.),"\'"', '"CUuuiiiiuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[['"xE\'"', ',', 'PUT', ['xx.xxxx.'], ',', '"\'"']]
[['"TRUuuuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'lec', '&jgjsd_vnv.']]
[['"xE\'"', ',', 'PUT', ['xx.xxxx.'], ',', '"\'"']]
[['"CUuuiiiiuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'lec', '&jgjsd_vnv.']]
['kkkkk;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from xxxx_x_xx_L ;\n', '"xE\'", PUT(xx.xxxx.),"\'"']
['quit; \n']
['\n']
['/* 1.xxxxx FROM xxxx_x_Ex_x */ \n']
['proc sql; ', ';\n', '"TRUuuuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(\n']
['SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and \n"]
['      ', ')\n', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[' );\n']
['\n']
['\n']
['jjjjjj;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from xxxx_x_xx_L ;\n', '"xE\'", PUT(xx.xxxx.),"\'"']
['quit; \n']
['\n']
['/* 1.xxxxx FROM xxxx_x_Ex_x */ \n']
['proc sql; ', ';\n', '"CUuuiiiiuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and \n"]
['      ', ')\n', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[' );']
  • 第三次尝试:
输出:

['"xE\'", PUT(xx.xxxx.),"\'"', '"TRUuuuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.", '"xE\'", PUT(xx.xxxx.),"\'"', '"CUuuiiiiuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[['"xE\'"', ',', 'PUT', ['xx.xxxx.'], ',', '"\'"']]
[['"TRUuuuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'lec', '&jgjsd_vnv.']]
[['"xE\'"', ',', 'PUT', ['xx.xxxx.'], ',', '"\'"']]
[['"CUuuiiiiuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'lec', '&jgjsd_vnv.']]
['kkkkk;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from xxxx_x_xx_L ;\n', '"xE\'", PUT(xx.xxxx.),"\'"']
['quit; \n']
['\n']
['/* 1.xxxxx FROM xxxx_x_Ex_x */ \n']
['proc sql; ', ';\n', '"TRUuuuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(\n']
['SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and \n"]
['      ', ')\n', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[' );\n']
['\n']
['\n']
['jjjjjj;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from xxxx_x_xx_L ;\n', '"xE\'", PUT(xx.xxxx.),"\'"']
['quit; \n']
['\n']
['/* 1.xxxxx FROM xxxx_x_Ex_x */ \n']
['proc sql; ', ';\n', '"CUuuiiiiuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and \n"]
['      ', ')\n', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[' );']
我无法获得的预期输出应该如下所示:

['"xE\'", PUT(xx.xxxx.),"\'"', '"TRUuuuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.", '"xE\'", PUT(xx.xxxx.),"\'"', '"CUuuiiiiuth"', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[['"xE\'"', ',', 'PUT', ['xx.xxxx.'], ',', '"\'"']]
[['"TRUuuuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'lec', '&jgjsd_vnv.']]
[['"xE\'"', ',', 'PUT', ['xx.xxxx.'], ',', '"\'"']]
[['"CUuuiiiiuth"']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'gff', '&jfjfsj_jfjfj.']]
[['xxx', ['xx_ix', 'as', 'format', "'xxxx-xx'"], 'lec', '&jgjsd_vnv.']]
['kkkkk;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from xxxx_x_xx_L ;\n', '"xE\'", PUT(xx.xxxx.),"\'"']
['quit; \n']
['\n']
['/* 1.xxxxx FROM xxxx_x_Ex_x */ \n']
['proc sql; ', ';\n', '"TRUuuuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(\n']
['SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and \n"]
['      ', ')\n', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[' );\n']
['\n']
['\n']
['jjjjjj;\n']
['\n']
['  select xx', ' jdfjhf:jhfjj from xxxx_x_xx_L ;\n', '"xE\'", PUT(xx.xxxx.),"\'"']
['quit; \n']
['\n']
['/* 1.xxxxx FROM xxxx_x_Ex_x */ \n']
['proc sql; ', ';\n', '"CUuuiiiiuth"']
['hhhjhfjs as fdsjfsj:\n']
['select * from djfkjd to jfkjs\n']
['(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj\n']
['\tFROM &xxx..xxx_xxx_xxE\n']
["where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and \n"]
['      ', ')\n', "xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv."]
[' );']
实施DarrylG的代码:

def parse(text):
    result = []
    parens_open = 0

    for char in text:
        if char == '(':
            parens_open += 1
            result.append(char)
        elif char == ')' and parens_open:
            parens_open -= 1
            result.append(char)
        elif char == '\n' and result and result[-1] != '\n':
            result.append(char)
        elif parens_open:
            result.append(char)

    return ''.join(result)


checkhere = set()               
checkhere.add("Select")
checkhere.add("From")
checkhere.add("select")
checkhere.add("from")
checkhere.add("SELECT")
checkhere.add("FROM")


with open('lan sample text file.txt', 'r') as fd:
    txt = fd.read()
    result = parse(txt)
    for chunk in parse(result):
        for x in checkhere:
            if x in chunk:
                print(chunk)    

以下代码输出与您最初的预期输出相同

def parse(text):
  result = []
  parens_open = 0

  for char in text:
    if char == '(':
      parens_open += 1
      result.append(char)
    elif char == ')' and parens_open:
      if parens_open == 1 and result[-1] == '(':
        result.pop()  # Removes empty unnested parens i.e. '()'
      else:
        result.append(char)
      parens_open -= 1

    elif char == '\n' and result and result[-1] != '\n':
      # ensure only one carriage return between text
      result.append(char)
    elif parens_open:
      result.append(char)

  return ''.join(result)

with open('test.txt', 'r') as fd:
  txt = fd.read()
  result = parse(txt)
  print(result)
输出

("xE'", PUT(xx.xxxx.),"'")
("TRUuuuth")
(
SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))
 )
("xE'", PUT(xx.xxxx.),"'")
("CUuuiiiiuth")
(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))(( ))
 )
(SELECT((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))
(SELECT
((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))
此代码用于更新有关检索特定字段的问题

def find_field(field, text):
  pattern = re.compile('\(\s*' + field, flags=re.IGNORECASE)
  matches = pattern.finditer(text)

  result = []
  for m in matches:
    s, e = m.span()
    parens_open = 0
    if result:
      result.append('\n(' + field)
    else:
      result.append('(' + field)

    for char in text[e+1:]: # skip field
      if char == '(':
        parens_open += 1
        result.append(char)
      elif char == ')' and parens_open:
        if parens_open == 1 and result[-1] == '(':
          result.pop()  # Removes empty parens
        else:
          result.append(char)
        parens_open -= 1
        if parens_open == 0:
          break         # end of field's enclosing left, right parens
      elif char == '\n' and result and result[-1] != '\n':
        result.append(char)
      elif parens_open:
        result.append(char)

  return ''.join(result)

# Test by retrieving select field
with open('test.txt', 'r') as fd:
  txt = fd.read()
  print(find_field("SELECT", txt))
输出

("xE'", PUT(xx.xxxx.),"'")
("TRUuuuth")
(
SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))
 )
("xE'", PUT(xx.xxxx.),"'")
("CUuuiiiiuth")
(SELECT abc AS abc1, abc_2_ AS efg, abc_fg, fkdkfj_vv, jjsflkl_ff, fjkdsf_jfkj
    FROM &xxx..xxx_xxx_xxE
where ((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))(( ))
 )
(SELECT((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))
(SELECT
((xxx(xx_ix as format 'xxxx-xx') gff &jfjfsj_jfjfj.) and
      (xxx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.))

谢谢你,DarrylG!我尝试了你的代码,它的工作结果是预期的!但假设我想在
结果中找到一些关键字(例如“选择”和“从”),这样我的最终输出就只有
(选择abc作为abc1,abc_2作为efg,abc_fg,fkdkfj_vv,jjsflkl_ff,fjkdsf_jfkj FROM&xxx..xxx_xxx,其中((xxx(xx_ix作为格式'xxxx xx')gff&jffffffjfj)和(xxx_ix作为格式'xxxx-xx'))lec&jgjsd(v.)(())
。那怎么办?我试过了,但什么也印不出来。请滚动到我上面编辑的文章底部,查看我试图实现的代码@杰基——我在你的帖子底部没有看到任何编辑。无论如何,要过滤掉(tag……),其中tag可以是关键字,如“SELECT”或“FROM”,点对应于任何字符,并且“')”是与tag相同级别的结尾参数?在本网页上按ctrl+f组合键DarrylG!-->实施DarrylG的code@jackie--我看到了。我去看看。非常感谢DarrylG!!我尝试了代码,也尝试了在文件中查找“FROM”,但它没有返回任何输出。“选择”怎么能做到?如果我想在文件中同时查找“SELECT”和“FROM”,因为我要查找两个字段,即“SELECT”和“FROM”,我是否应该将代码改为
pattern=re.compile('\(\s*'+field+field,flags=re.IGNORECASE)
,意思是在该行代码中添加另一个
+field