从括号内的字符串中提取分隔列表RegEx Python

从括号内的字符串中提取分隔列表RegEx Python,python,regex,string,Python,Regex,String,我见过不少类似的正则表达式问题,但似乎没有一个能正确处理我的奇怪情况。我有一个字符串列表,如下所示: ['[Business Layer~Project Owning Org~Proj Owning Dept ID]', '[Business Layer~Project Owning Org~Proj Owning Org Name]', '[Business Layer~Project~Proj No]', '[Business Layer~Project~Proj Name]', "

我见过不少类似的正则表达式问题,但似乎没有一个能正确处理我的奇怪情况。我有一个字符串列表,如下所示:

['[Business Layer~Project Owning Org~Proj Owning Dept ID]', '[Business Layer~Project Owning Org~Proj Owning Org Name]', '[Business Layer~Project~Proj No]', '[Business Layer~Project~Proj Name]', "([Business Layer~Project~Proj No]) || COALESCE((' - ' || ([Business Layer~Project~Proj Name])), ' - ')", '[Project Assignment Fact~Task~Task No]', '[Project Assignment Fact~Task~Task Name]', "([Project Assignment Fact~Task~Task No]) || COALESCE((' - ' || ([Project Assignment Fact~Task~Task Name])), ' - ')", "([Business Layer~Project~Proj No]) || COALESCE((' - ' || ([Project Assignment Fact~Task~Task No])), ' - ') || COALESCE((' - ' || ([Project Assignment Fact~Task~Task Name])), ' - ')", '[Business Layer~Project Cost~Short Code Alias]', '[Business Layer~Expenditure Type~Expenditure Category Name]', '[Business Layer~Expenditure Type~Expenditure Type Parent Code]', '[Business Layer~Expenditure Type~Expend Type Desc]', '[Business Layer~Expenditure Owning Org~Exp Owning Org Name]', '[Business Layer~Transaction Source~Trans Source]', '[Business Layer~Employee~Employee Name]', '[Business Layer~Project Cost~Expend Comment]', '[Business Layer~Project Cost~PO No]', '[Business Layer~Project Cost~PV Invoice No]', '[Business Layer~Vendor~Vendor Name]', '[Business Layer~Scenario~Scenario Name]', '[Business Layer~ERS Employee~ERS Employee Name]', '[Business Layer~ERS Employee~ERS Employee Number]', '[Business Layer~Project Cost~Vehicle Tag No]', '[Business Layer~Project Cost~Vehicle Make]', '[Business Layer~Project Cost~Vehicle Model]', '[Business Layer~Project Cost~Vehicle Mileage]', '[Business Layer~Project Type~Proj Type Code]', '[Business Layer~GL Period~GL Period Start Date]', '[Business Layer~Project Cost~Burdened Cost Amt]']
正如你所看到的,有些弦很乱。i、 e:

([Business Layer~Project~Proj No]) || COALESCE((' - ' || ([Business Layer~Project~Proj Name])), ' - ')
我想提取括号中的内容作为列表。对于上面混乱的一个,理想的输出应该是一个嵌套列表,如:

[['Business Layer', 'Project', 'Proj No'], ['Business Layer', 'Project', 'Proj Name']]
我尝试了一些不同的正则表达式解决方案,从各种类似的问题,但没有成功。一些不成功的例子:

# This one is close, but only accounts for 1 list
for i in test:
    result = re.findall("([^(~)]+)(?!.*\()+", i)
    print(result)


# Yields a blank list AND more importantly, some of these are longer than 3.
for i in test:
    result = re.findall("(\[.*?\]\~\[.*?\]\~\[.*?\])", i)
    print(result)


# This captures the beginning but not the end

^\[([^~]+)

# This essentially captures everything but what I want

[^~]+(?=\[.*?\]*$)


请让我知道你的想法。我对正则表达式感到困惑,我会尝试一些不同的方法——只搜索字符
a-Z
和空格

如果
string\u list
是您的问题列表:

import re

for s in string_list:
    print(re.findall(r"[A-Z][\sa-zA-Z]*", s))
印刷品:

['Business Layer', 'Project Owning Org', 'Proj Owning Dept ID']
['Business Layer', 'Project Owning Org', 'Proj Owning Org Name']
['Business Layer', 'Project', 'Proj No']
['Business Layer', 'Project', 'Proj Name']
['Business Layer', 'Project', 'Proj No', 'COALESCE', 'Business Layer', 'Project', 'Proj Name']
['Project Assignment Fact', 'Task', 'Task No']
['Project Assignment Fact', 'Task', 'Task Name']
['Project Assignment Fact', 'Task', 'Task No', 'COALESCE', 'Project Assignment Fact', 'Task', 'Task Name']
['Business Layer', 'Project', 'Proj No', 'COALESCE', 'Project Assignment Fact', 'Task', 'Task No', 'COALESCE', 'Project Assignment Fact', 'Task', 'Task Name']
['Business Layer', 'Project Cost', 'Short Code Alias']
['Business Layer', 'Expenditure Type', 'Expenditure Category Name']
['Business Layer', 'Expenditure Type', 'Expenditure Type Parent Code']
['Business Layer', 'Expenditure Type', 'Expend Type Desc']
['Business Layer', 'Expenditure Owning Org', 'Exp Owning Org Name']
['Business Layer', 'Transaction Source', 'Trans Source']
['Business Layer', 'Employee', 'Employee Name']
['Business Layer', 'Project Cost', 'Expend Comment']
['Business Layer', 'Project Cost', 'PO No']
['Business Layer', 'Project Cost', 'PV Invoice No']
['Business Layer', 'Vendor', 'Vendor Name']
['Business Layer', 'Scenario', 'Scenario Name']
['Business Layer', 'ERS Employee', 'ERS Employee Name']
['Business Layer', 'ERS Employee', 'ERS Employee Number']
['Business Layer', 'Project Cost', 'Vehicle Tag No']
['Business Layer', 'Project Cost', 'Vehicle Make']
['Business Layer', 'Project Cost', 'Vehicle Model']
['Business Layer', 'Project Cost', 'Vehicle Mileage']
['Business Layer', 'Project Type', 'Proj Type Code']
['Business Layer', 'GL Period', 'GL Period Start Date']
['Business Layer', 'Project Cost', 'Burdened Cost Amt']
我的2美分:

list(映射(lambda y:[x.split('~')代表re.findall(r'\[([^\].\[]*)\]',y)],所有字符串))
其中,
所有字符串
是问题中的字符串列表加上
”[“如果([业务层~场景~场景名称]=“2013年预算”和[业务层~总账期间~总账年数]=2013年)则([业务层~总账余额~期间净DR金额]-[业务层~总账余额~期间净CR金额])否则(0)”,“如果”

这里是
所有字符串中每个字符串的结果:

[业务层~项目拥有组织~项目拥有部门ID]-->[[业务层”,“项目拥有组织”,“项目拥有部门ID']
[业务层~项目拥有组织~项目拥有组织名称]->[[[业务层”,“项目拥有组织”,“项目拥有组织名称”]]
[业务层~项目~项目编号]->[[[业务层”,“项目”,“项目编号]]
[业务层~Project~Proj Name]-->[[“业务层”、“项目”、“项目名称”]]
([业务层~项目~项目编号])| |合并(“-”| |([业务层~项目~项目名称]),“-”)-->[[“业务层”,“项目”,“项目编号],[“业务层”,“项目”,“项目名称]]
[Project Assignment Fact~ Task~ Task No]->[[[“项目分配事实”、“任务”、“任务编号”]]
[Project Assignment Fact~ Task~ Task Name]-->[[['Project Assignment Fact','Task','Task Name']
([Project Assignment Fact~ Task~ Task No])| |合并(“-”| |([Project Assignment Fact~ Task~ Task Name]),“-”)-->[[[[Project Assignment Fact]、[Task]、[Task No]、[Project Assignment Fact]、[Task]、[Task Name]]
([Business Layer~Project~Proj No])|| COALESCE(“-”|([Project Assignment Fact~Task~Task No]),“-”)| | COALESCE(“-”|([Project Assignment Fact~Task~Task Name]),“-”-->[[Business Layer”,“Project”,“Project”,“Proj No”,“Proj No”,“Project Assignment Fact”,“Task”,“Task”,“Task”,“Task No”,“Task No”,“Proj”],[“Project Assignment Fact Fact Fact Fact Fact Fact Fact Fact Fact Fact”,“Task”,“
[业务层~项目成本~短代码别名]-->[[“业务层”、“项目成本”、“短代码别名”]]
[业务层~支出类型~支出类别名称]->[[业务层”,“支出类型”,“支出类别名称]]
[业务层~支出类型~支出类型父代码]->[[业务层”,“支出类型”,“支出类型父代码]]
[业务层~支出类型~支出类型说明]->[[业务层”,“支出类型”,“支出类型说明]]
[业务层~支出拥有组织~Exp拥有组织名称]->[[[业务层”,“支出拥有组织”,“Exp拥有组织名称]]
[业务层~事务源~跨源]-->[[“业务层”、“事务源”、“跨源”]]
[业务层~Employee~Employee Name]-->[[“业务层”、“员工”、“员工姓名”]]
[业务层~项目成本~支出注释]-->[[“业务层”、“项目成本”、“支出注释”]]
[业务层~项目成本~采购订单号]->[[业务层”,“项目成本”,“采购订单号]]
[业务层~项目成本~ PV发票号]->[[业务层”,“项目成本”,“PV发票号]]
[业务层~供应商~供应商名称]-->[[[业务层”,“供应商”,“供应商名称]]
[业务层~场景~场景名称]-->[[[“业务层”、“场景”、“场景名称”]]
[Business Layer~ERS Employee~ERS Employee Name]-->[[“业务层”、“ERS员工”、“ERS员工姓名”]]
[Business Layer~ERS Employee~ERS Employee Number]-->[[“业务层”、“ERS员工”、“ERS员工编号”]]
[业务层~项目成本~车辆标签号]->[[业务层”,“项目成本”,“车辆标签号]]
[业务层~项目成本~车辆制造]->[[业务层”,“项目成本”,“车辆制造]]
[业务层~项目成本~车型]->[[业务层”,“项目成本”,“车型]]
[业务层~项目成本~车辆里程]-->[[业务层”,“项目成本”,“车辆里程]]
[业务层~项目类型~项目类型代码]->[[业务层”,“项目类型”,“项目类型代码]]
[业务层~总账期间~总账期间开始日期]->[[业务层”,“总账期间”,“总账期间开始日期]]
[业务层~项目成本~负担成本金额]->[[业务层”,“项目成本”,“负担成本金额]]
[“如果([业务层~方案~方案名称]=2013年预算和[业务层~总账期间~总账年度数]=2013年)那么([业务层~总账余额~期间净折旧金额]-[业务层~总账余额~期间净折旧金额])其他(0)”,如果]->[[业务层”,“方案”,“方案名称”],[“业务层”,“总账期间”,“总账年度数”],[‘业务层’、‘总账余额’、‘期间净折旧金额’、[‘业务层’、‘总账余额’、‘期间净折旧金额’]]

这很好。关闭但没有雪茄。问题是其中一些基本上都是很长的代码片段。例如:
[“如果([Business Layer~Scenario~Scenario Name]='Budget 2013'和[Business Layer~GL Period~GL Year Number]=2013),那么([Business Layer~GL Balances~ Period DR Amt]-[Business Layer~GL Balances~ Period Net CR Amt])else(0)”,“if']
这会产生很多不需要的“if、when、budegets等”@SamDean这在我看来像是
X-Y
问题。你是否尝试用regex解析javascript/其他语言?我想是这样的