Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 根据原始列表之一中的重复项,将列表列表拆分为两个列表列表_Python_List - Fatal编程技术网

Python 根据原始列表之一中的重复项,将列表列表拆分为两个列表列表

Python 根据原始列表之一中的重复项,将列表列表拆分为两个列表列表,python,list,Python,List,我有两张名单 queryBounds = [[2, 1924], [2, 1924], [2187, 2233], [2187, 2233]] sequenceBounds = [[95516, 97442], [139777, 137851], [97433, 97479], [137860, 137814]] 我想将queryBounds分为两个列表queryBoundsA和queryBoundsB,这样两个列表中都没有重复项,即 queryBoundsA = [[2, 1924], [2

我有两张名单

queryBounds = [[2, 1924], [2, 1924], [2187, 2233], [2187, 2233]]
sequenceBounds = [[95516, 97442], [139777, 137851], [97433, 97479], [137860, 137814]]
我想将
queryBounds
分为两个列表
queryBoundsA
queryBoundsB
,这样两个列表中都没有重复项,即

queryBoundsA = [[2, 1924], [2187, 2233]]
queryBoundsB = [[2, 1924], [2187, 2233]]
然后我想将
sequenceBounds
分为两个列表
sequenceBoundsA
sequenceBoundsB
,以便
sequenceBounds
中与项
queryBounds
具有相同索引的项移动到
queryBoundsA
queryBoundsB
中,它们的列表索引与
sequenceBounds
中具有相同索引的项的位置相匹配

sequenceBoundsA = [[95516, 97442], [97433, 97479]]
sequenceBoundsB = [[139777, 137851], [137860, 137814]]
queryBounds
仅包含重复项时,我还需要使用此选项:

queryBounds = [[2, 2233], [2, 2233]]
sequenceBounds = [[111722, 113939], [166447, 164230]]

queryBoundsA = [[2, 2233]]
queryBoundsB = [[2, 2233]]

sequenceBoundsA = [[111722, 113939]]
sequenceBoundsB = [[166447, 164230]]

我想不出怎么做这件事,让字典来做吧。这将处理2份以上的副本:

queryBounds = [[2, 1924], [2, 1924], [2187, 2233], [2187, 2233]]
sequenceBounds = [[95516, 97442], [139777, 137851], [97433, 97479], [137860, 137814]]

# Create a dict:

sorter = {}
for q, s in zip(queryBounds,sequenceBounds):
    q = tuple(q)
    if q not in sorter:
        sorter[q] = [s]
    else:
        sorter[q].append( s )

# Print the results:
qb = [[]]
sb = [[]]
for k, v in sorter.items():
    for i,v1 in enumerate(v):
        if i >= len(qb):
            qb.append( [] )
            sb.append( [] )
        qb[i].append( list(k) )
        sb[i].append( v1 )

print( qb )
print( sb )
输出:

[[[2, 1924], [2187, 2233]], [[2, 1924], [2187, 2233]]]
[[[95516, 97442], [97433, 97479]], [[139777, 137851], [137860, 137814]]]
[[[2, 1924], [2187, 2233]], [[2, 1924], [2187, 2233]]]
[[[95516, 97442], [97433, 97479]], [[139777, 137851], [137860, 137814]]]
[[[2, 2233]], [[2, 2233]]]
[[[111722, 113939]], [[166447, 164230]]]

您可以将
enumerate
itertools.groupby
一起使用:

from itertools import groupby as gb
def split_bounds(query, seq):
   r = list(zip(*[list(b) for _, b in gb(enumerate(query), key=lambda x:x[-1])]))
   query_r = [[b for _, b in i] for i in r]
   seq_r = [[seq[a] for a, _ in i] for i in r]
   return query_r, seq_r

输出:

[[[2, 1924], [2187, 2233]], [[2, 1924], [2187, 2233]]]
[[[95516, 97442], [97433, 97479]], [[139777, 137851], [137860, 137814]]]
[[[2, 1924], [2187, 2233]], [[2, 1924], [2187, 2233]]]
[[[95516, 97442], [97433, 97479]], [[139777, 137851], [137860, 137814]]]
[[[2, 2233]], [[2, 2233]]]
[[[111722, 113939]], [[166447, 164230]]]

问题:如果queryBounds列表中有三个项目出现,该怎么办?或者这是不太可能发生的?在您将queryBoundsA和queryBoundsB进行分类的基础上,更大的测试数据将更容易理解,请重复并从下面开始。“演示如何解决此编码问题”不是堆栈溢出问题。我们希望您做出诚实的尝试,然后就您的算法或技术提出具体问题。堆栈溢出不是用来取代现有的文档和教程。堆栈溢出不是一种编码服务,我们通常不处理代码设计。如果需要帮助,请将示例扩展到完整的规范,并发布问题代码。(请参阅(MRE))。最佳策略可能完全取决于您希望使用这些对的目的。我可以想象,当您可以迭代压缩列表并在运行时检查重复列表时,拆分列表可能是浪费时间。