在python中合并n个已排序的元组列表

在python中合并n个已排序的元组列表,python,algorithm,sorting,tuples,merge,Python,Algorithm,Sorting,Tuples,Merge,我有n个列表(n 这里主要有两件事 [(i,k) for i,j in INPUT for k in j] 将输入转换为此结构 [('A', (0.12, 'how')), ('A', (0.26, 'are')), ('A', (0.7, 'you')), ('A', (0.9, 'mike')), ('A', (1.9, "I'm fine")), ('B', (1.23, 'fine')), ('B', (1.5, 'thanks')), ('B', (1.6, 'and

我有n个列表(n 这里主要有两件事

[(i,k) for i,j in INPUT for k in j]
将输入转换为此结构

[('A', (0.12, 'how')),
 ('A', (0.26, 'are')),
 ('A', (0.7, 'you')),
 ('A', (0.9, 'mike')),
 ('A', (1.9, "I'm fine")),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6, 'and you')),
 ('C', (2.12, 'good')),
 ('C', (2.24, 'morning')),
 ('C', (3.13, 'guys'))]

对每个元素的L buy item[1]进行排序。这实际上是(0.12,'how'),(0.27,'are')…但是python对元组进行排序的通常方式是从左到右,因此我们不需要做额外的工作来从元组中删除单词(好的,示例数据使问题描述更加清晰。答案相应地修改)

第1步:通过反向工程您当前的解决方案来澄清您的问题描述

  • 有4个不同的数据集,分别标记为A、B、C和D
  • 这些数据集包含在一系列2元组中(ListID,elements)
  • 每个“elements”条目本身就是表单(索引、值)的2元组列表
  • 空元素条目表示数据集的结束
  • 目标是将这些数据集合并到一个由两个元组(ListID,(index,value))组成的单排序列表中
  • 步骤2:转换输入数据以创建所需表单的单独记录

    生成器是为这类事情而构建的,因此定义一个生成器是有意义的

    def get_data(flow, num_data_sets=4):
        finished = set()
        for list_id, elements in flow:
            if list_id in finished:
                continue
            if not elements:
                finished.add(list_id)
                if len(finished) == num_data_sets:
                    break
                continue
            for element in elements:
                yield list_id, element
    
    步骤3:使用
    排序
    生成所需的排序列表

    content = sorted(get_data(flow))
    
    示例用法:

    # get_data defined via copy/paste of source code above
    # ref_data taken from the revised question
    >>> demo_data = [
    ...   ('A', [(1, 2), (3, 4)]),
    ...   ('B', [(7, 8), (9, 10)]),
    ...   ('A', [(0, 0)]),
    ...   ('C', []), # Finish early
    ...   ('C', [('ignored', 'entry')])
    ... ]
    >>> content = sorted(get_data(demo_data))
    >>> print '\n'.join(map(str, content))
    ('A', 0, 0)
    ('A', 1, 2)
    ('A', 3, 4)
    ('B', 7, 8)
    ('B', 9, 10)
    >>> content = sorted(get_data(ref_data), key=itemgetter(1))
    >>> print '\n'.join(map(str, content))
    ('A', 0.12, 'how')
    ('A', 0.26, 'are')
    ('A', 0.7, 'you')
    ('A', 0.9, 'mike')
    ('B', 1.23, 'fine')
    ('B', 1.5, 'thanks')
    ('B', 1.6, 'and you')
    ('A', 1.9, "I'm fine too")
    ('C', 2.12, 'good')
    ('C', 2.24, 'morning')
    ('C', 3.13, 'guys')
    
    由于两个主要原因,您的解决方案最终变得凌乱且难以阅读:

  • 未能使用生成器意味着您无法充分利用内置排序函数
  • 通过使用索引而不是元组解包,您很难跟踪什么是什么
  • 你的意见:

    l = [('A',
        [(0.12, 'how'),
        (0.26000000000000001, 'are'),
        (0.69999999999999996, 'you'),
        (0.90000000000000002, 'mike'),
        (1.8999999999999999, "I'm fine too")]),
        ('B', [(1.23, 'fine'), (1.5, 'thanks'), (1.6000000000000001, 'and you')]),
        ('C',
        [(2.1200000000000001, 'good'),
        (2.2400000000000002, 'morning'),
        (3.1299999999999999, 'guys')])]
    
    转换(并打印):

    检查

    [('A', (0.12, 'how')),
     ('A', (0.26000000000000001, 'are')),
     ('A', (0.69999999999999996, 'you')),
     ('A', (0.90000000000000002, 'mike')),
     ('B', (1.23, 'fine')),
     ('B', (1.5, 'thanks')),
     ('B', (1.6000000000000001, 'and you')),
     ('A', (1.8999999999999999, "I'm fine too")),
     ('C', (2.1200000000000001, 'good')),
     ('C', (2.2400000000000002, 'morning')),
     ('C', (3.1299999999999999, 'guys'))]
    

    当然,您可以在列表理解范围内执行此操作,但是您仍然使用2个
    来执行
    循环,使用1个内置的
    排序
    函数。这样可能会使其更加详细和可读。

    Timsort对于部分排序的数据非常快。您做的工作太多了。不要将max用作变量名,您可能希望能够使用builtin函数
    max()
    someday听起来你想做的事情很简单,但你给我们的只是一些令人困惑的输出,甚至不是python结构,也没有示例输入。例如,在你的输出中,列表ID都分组在一起。请给出一些有效的python数据结构来显示输入和所需的输出。这没有输出ut的顺序正确(A(1.8999999999999,“我也很好”)应该进一步向下),但输出的顺序不正确(A(1.8999999999999,“我也很好”)gnibbler,更新为包含已排序。不过,提供的示例解决方案表明,问题规范的内容比所述内容更多(即,空子列表将“结束”该数据集的部分,防止处理具有该列表id的任何后续条目,以及在指定数量的不同数据集完成后提前终止循环)非常有效,谢谢。有什么方法可以将其保存到列表中吗?@mat,
    sorted()
    返回一个列表
    content = sorted(get_data(flow))
    
    # get_data defined via copy/paste of source code above
    # ref_data taken from the revised question
    >>> demo_data = [
    ...   ('A', [(1, 2), (3, 4)]),
    ...   ('B', [(7, 8), (9, 10)]),
    ...   ('A', [(0, 0)]),
    ...   ('C', []), # Finish early
    ...   ('C', [('ignored', 'entry')])
    ... ]
    >>> content = sorted(get_data(demo_data))
    >>> print '\n'.join(map(str, content))
    ('A', 0, 0)
    ('A', 1, 2)
    ('A', 3, 4)
    ('B', 7, 8)
    ('B', 9, 10)
    >>> content = sorted(get_data(ref_data), key=itemgetter(1))
    >>> print '\n'.join(map(str, content))
    ('A', 0.12, 'how')
    ('A', 0.26, 'are')
    ('A', 0.7, 'you')
    ('A', 0.9, 'mike')
    ('B', 1.23, 'fine')
    ('B', 1.5, 'thanks')
    ('B', 1.6, 'and you')
    ('A', 1.9, "I'm fine too")
    ('C', 2.12, 'good')
    ('C', 2.24, 'morning')
    ('C', 3.13, 'guys')
    
    data = [(x,id) for (id, xs) in data for x in xs]
    data.sort()
    for xs,id in data:
        print id,xs
    
    
    A (0.12, 'how')
    A (0.26000000000000001, 'are')
    A (0.69999999999999996, 'you')
    A (0.90000000000000002, 'mike')
    B (1.23, 'fine')
    B (1.5, 'thanks')
    B (1.6000000000000001, 'and you')
    A (1.8999999999999999, "I'm fine too")
    C (2.1200000000000001, 'good')
    C (2.2400000000000002, 'morning')
    C (3.1299999999999999, 'guys')
    
    l = [('A',
        [(0.12, 'how'),
        (0.26000000000000001, 'are'),
        (0.69999999999999996, 'you'),
        (0.90000000000000002, 'mike'),
        (1.8999999999999999, "I'm fine too")]),
        ('B', [(1.23, 'fine'), (1.5, 'thanks'), (1.6000000000000001, 'and you')]),
        ('C',
        [(2.1200000000000001, 'good'),
        (2.2400000000000002, 'morning'),
        (3.1299999999999999, 'guys')])]
    
    newlist = []
    for alpha, tuplelist in l:
        for tup in tuplelist:
            newlist.append((alpha,tup))
    
    from operator import itemgetter
    sorted(newlist,key=itemgetter(1))
    print newlist
    
    [('A', (0.12, 'how')),
     ('A', (0.26000000000000001, 'are')),
     ('A', (0.69999999999999996, 'you')),
     ('A', (0.90000000000000002, 'mike')),
     ('B', (1.23, 'fine')),
     ('B', (1.5, 'thanks')),
     ('B', (1.6000000000000001, 'and you')),
     ('A', (1.8999999999999999, "I'm fine too")),
     ('C', (2.1200000000000001, 'good')),
     ('C', (2.2400000000000002, 'morning')),
     ('C', (3.1299999999999999, 'guys'))]