Python 根据条件从另一个列表创建新列表_Python_Python 2.7_List_Filter_Conditional Statements

Python 根据条件从另一个列表创建新列表

python python-2.7 list filter

Python 根据条件从另一个列表创建新列表,python,python-2.7,list,filter,conditional-statements,Python,Python 2.7,List,Filter,Conditional Statements,我正在尝试根据以下条件从另一个列表创建新列表： lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1), ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1), ("Id02","#instr2",1),("Id02","#instr5",1)

我正在尝试根据以下条件从另一个列表创建新列表：

lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
       ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
       ("Id02","#instr2",1),("Id02","#instr5",1)]

table, instrlist = '', ''; code, instructions = [], []; qty = 0

for idx, l in enumerate(lst):
    table = l[0]
    if not l[1].startswith('#'):
        code = l[1]; qty = l[2]; instructions = []
    else:
        instructions.append(l[1])
    print idx, table, code, instructions, qty

每次代码出现在包含“#”的元组之后的元组上时，我需要将正确的行传输到程序的另一部分，并重置以开始处理另一部分。我设置了一系列条件，得到了这个结果：

0 Id01 Code1 [] 1
1 Id01 Code1 ['#instr1'] 1
2 Id01 Code1 ['#instr1', '#instr2'] 1
3 Id01 Code1 ['#instr1', '#instr2', '#instr4'] 1
4 Id01 Code2 [] 1
5 Id01 Code2 ['#instr3'] 1
6 Id01 Code2 ['#instr3', '#instr2'] 1
7 Id02 Code2 [] 1
8 Id02 Code2 ['#instr2'] 1
9 Id02 Code2 ['#instr2', '#instr5'] 1

然而，我真正需要的结果是

3 Id01 Code1 ['#instr1', '#instr2', '#instr4'] 1
6 Id01 Code2 ['#instr3', '#instr2'] 1
9 Id02 Code2 ['#instr2', '#instr5'] 1

我需要再次过滤的条件是什么

我没有足够的技能来使用列表理解或内置过滤器，我想让代码尽可能可读（对于新手），至少在我了解更多之前

更新：

jpp提供的解决方案似乎是最有效和可读的：

from collections import defaultdict
from itertools import count, chain

lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
       ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
       ("Id02","#instr2",1),("Id02","#instr5",1)]

d = defaultdict(list)
enums = []
c = count()

for ids, action, num in lst:
    if not action.startswith('#'):
        my_ids, my_action = ids, action
        enums.append(next(c))
    else:
        d[(my_ids, my_action)].append([action, num])
        next(c)
enums = enums[1:] + [len(lst)]

for idx, ((key1, key2), val) in enumerate(d.items()):
    print (enums[idx]-1, key1, key2, list(chain.from_iterable(val)), val[0][-1])

然而，我面临着一些问题

由于某些原因，顺序错误（最后一行变为第一行）：结果：

('Id01', 'Code1', 1, '#instr1', '#instr2', '#instr4')
('Id01', 'Code2', 1, '#instr3', '#instr2')
('Id02', 'Code2', 1, '#instr2', '#instr5')

（3，'Id02'，'Code2'，['instr2'，1'，'instr5'，1]，1）

集合。defaultdict提供了直观的解决方案。其思想是创建一个字典，如果元组的前两个组件不是以“#”开头的，则将键设置为元组的前两个组件。然后迭代字典，以所需格式打印

使用itertools时会遇到一些麻烦。计算以获得所需的索引。我相信你可以改进这项工作
from collections import defaultdict
from itertools import count, chain

lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
       ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
       ("Id02","#instr2",1),("Id02","#instr5",1)]

d = defaultdict(list)
enums = []
c = count()

for ids, action, num in lst:
    if not action.startswith('#'):
        my_ids, my_action = ids, action
        enums.append(next(c))
    else:
        d[(my_ids, my_action)].append([action, num])
        next(c)

enums = enums[1:] + [len(lst)]

结果:
for idx, ((key1, key2), val) in enumerate(d.items()):
    print(enums[idx]-1, key1, key2, list(chain.from_iterable(val)), val[0][-1])

3 Id01 Code1 ['#instr1', 1, '#instr2', 1, '#instr4', 1] 1
6 Id01 Code2 ['#instr3', 1, '#instr2', 1] 1
9 Id02 Code2 ['#instr2', 1, '#instr5', 1] 1

您可以使用itertools.groupby
：
import itertools 
import re
lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
   ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
   ("Id02","#instr2",1),("Id02","#instr5",1)]
results = {a:list(b) for a, b in itertools.groupby(sorted(lst, key=lambda x:x[0]), key=lambda x:x[0])}
code_groupings = {a:[[c, list(d)] for c, d in itertools.groupby(b, key=lambda x:'Code' in x[1])] for a, b in results.items()}
count = 0
last_code = None
for a, b in sorted(code_groupings.items(), key=lambda x:x[0]):
  for c, results in b:
    if c:
      count += 3
      last_code = results[0][1]
    else:
      print('{} {} {} {} 1'.format(count, a, last_code, str([i[1] for i in results])))

输出：
3 Id01 Code1 ['#instr1', '#instr2', '#instr4'] 1
6 Id01 Code2 ['#instr3', '#instr2'] 1
9 Id02 Code2 ['#instr2', '#instr5'] 1

由于我无法纠正我在jpp提供的解决方案中发现的问题（我的缺点是，我需要花一些空闲时间进行更多的研究），我已经详细阐述了我自己的代码。显然不是“python方式”，但效果很好：
lst = [("Id01","Code1",1),("Id01","#instr1",1),("Id01","#instr2",1),("Id01","#instr4",1),
       ("Id01","Code2",1),("Id01","#instr3",1),("Id01","#instr2",1),("Id02","Code2",1),
       ("Id02","#instr2",1),("Id02","#instr5",1)]

instr, newline = [], []
for idx, codex, qtx in reversed(lst): #reversed list is more simple to read

    if codex.startswith('#'):
        instr.insert(0, codex) #here I'm creating the tuple in the right order
    else:
        newline += tuple([(idx, codex, qtx) + tuple(instr)])
        instr = []

newline = newline[::-1] #reversed the list to respect the order of the original list (lst) 

for n in newline:
    print n

结果：
('Id01', 'Code1', 1, '#instr1', '#instr2', '#instr4')
('Id01', 'Code2', 1, '#instr3', '#instr2')
('Id02', 'Code2', 1, '#instr2', '#instr5')

基本思想是恢复输入列表（lst），因为在for循环中详细说明条件更简单。格式化元组后，我需要反转输出列表（换行符）以获得正确的顺序。
我冒昧地为像我这样的新手添加了一些评论，以便更好地阅读
我知道这是一个肮脏的编码，我很肯定我可以做得更好，但现在我有严重的问题，结合各种列表理解例程。
总有一天我会提高我的编码技能。
这种编码让我想起我对python的了解有多少。谢谢，对我来说不太可读（但我会仔细分析脚本），但只是。。。作品据我所知，这只是一个小问题：以“#”开头的项目是列表中的列表。是否可以将其作为单个列表（不嵌套）？请注意，我放置的列表的idx仅用于确定调试的位置。它不需要，或者至少会被忽略。我很抱歉，这是一个丢失的信息。太好了！我现在正在阅读有关集合和intertools模块的内容。谢谢你，伙计，我过去完全错过了这些模块。有一点帮助，如果你再次运行你的脚本，你会注意到idx 9是第一个显示你状态打印的地方（枚举[idx]-1，键1，键2，列表（chain.from_iterable（val）），val[0][1]）。。。应该是索引9，但却是索引3。@费德里科洛尼，对不起，注释对代码来说真的很糟糕。我根本看不懂。如果你有什么需要解释的，请告诉我。正如我所说的，如果您理解代码，您应该能够改进某些方面。如果您不懂某一点，请告诉我&我非常乐意解释：）。是的，在注释中添加代码几乎是不可能的。我开始理解其中的逻辑，我的想法非常简单。但是你能温和地看一下剧本的结果吗？for循环末尾的print语句与您在帖子中显示的列表顺序不符：当我在pycharm上运行脚本时，最后一个元组位于第一行。@jpp提供的解决方案看起来更干净（对我来说），但谢谢你！我会研究两者，看看哪一个更合适。我看到你在打印语句中使用固定的“1”，但实际上这是一个可以改变的值，数量也是如此…@FedericoLeoni关于其值的规则是什么？元组中的最后一个数字是需要从数据库中取出的代码的数量。#instr代码的数量与代码的数量有关，可以省略。