在python中删除数组中的重复数组
我想忽略以最低运行成本拥有多个阵列的阵列中的重复项。比如,在python中删除数组中的重复数组,python,arrays,Python,Arrays,我想忽略以最低运行成本拥有多个阵列的阵列中的重复项。比如, A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']] 预期的输出应该如下所示 Output = [['1','2'],['3','4'],['5','6'],['7','8']] 是否可以在一个数组中比较数组。 我是这样做的 A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]
A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]
预期的输出应该如下所示
Output = [['1','2'],['3','4'],['5','6'],['7','8']]
是否可以在一个数组中比较数组。
我是这样做的
A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]
output = set()
for x in A:
output.add(x)
print (output)
但它提示
TypeError:不可损坏的类型:“列表”
一个简单的方法是:
uniques = set()
output = []
for x in A:
val = '-'.join([str(key) for key in x])
if val not in uniques:
output.append(x)
uniques.add(val)
print (output)
输出:
[['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]
一个简单的方法是:
uniques = set()
output = []
for x in A:
val = '-'.join([str(key) for key in x])
if val not in uniques:
output.append(x)
uniques.add(val)
print (output)
输出:
[['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]
简单点,比如:
B = list(map(list, set(map(tuple, A))))
这是我的“bakeoff”--如果我曲解了你的解决方案,请告诉我:
import timeit
from random import choice
DIGITS = list("123456789")
# one million elements in list
A = [[choice(DIGITS), choice(DIGITS)] for _ in range(1000000)]
def elena(A): # MrName's solution is identical
B = []
for i in A:
if i not in B:
B.append(i)
return B
def cdlane(A):
return list(map(list, set(map(tuple, A))))
def VikashSingh(A):
uniques = set()
B = []
for x in A:
val = '-'.join([str(key) for key in x])
if val not in uniques:
B.append(x)
uniques.add(val)
return B
def AbhilekhSingh(A):
def unique_elements(l):
last = object()
for item in l:
if item == last:
continue
yield item
last = item
return list(unique_elements(sorted(A)))
# sanity check to make sure everyone one agrees on the answer
B = sorted(elena(A))
assert(B == sorted(cdlane(A)))
assert(B == sorted(VikashSingh(A)))
assert(B == sorted(AbhilekhSingh(A)))
print("elena:", format(timeit.timeit('B = elena(A)', number=10, globals=globals()), ".3"))
print("cdlane:", format(timeit.timeit('B = cdlane(A)', number=10, globals=globals()), ".3"))
print("VikashSingh:", format(timeit.timeit('B = VikashSingh(A)', number=10, globals=globals()), ".3"))
print("AbhilekhSingh:", format(timeit.timeit('B = AbhilekhSingh(A)', number=10, globals=globals()), ".3"))
In [27]: A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'], ['7','8']]
In [28]: new_list = []
In [29]: for i in A:
...: if i not in new_list:
...: new_list.append(i)
...:
In [30]: new_list
Out[30]: [['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]
结果
elena: 17.5
cdlane: 2.04
VikashSingh: 10.0
AbhilekhSingh: 8.83
简单点,比如:
B = list(map(list, set(map(tuple, A))))
这是我的“bakeoff”--如果我曲解了你的解决方案,请告诉我:
import timeit
from random import choice
DIGITS = list("123456789")
# one million elements in list
A = [[choice(DIGITS), choice(DIGITS)] for _ in range(1000000)]
def elena(A): # MrName's solution is identical
B = []
for i in A:
if i not in B:
B.append(i)
return B
def cdlane(A):
return list(map(list, set(map(tuple, A))))
def VikashSingh(A):
uniques = set()
B = []
for x in A:
val = '-'.join([str(key) for key in x])
if val not in uniques:
B.append(x)
uniques.add(val)
return B
def AbhilekhSingh(A):
def unique_elements(l):
last = object()
for item in l:
if item == last:
continue
yield item
last = item
return list(unique_elements(sorted(A)))
# sanity check to make sure everyone one agrees on the answer
B = sorted(elena(A))
assert(B == sorted(cdlane(A)))
assert(B == sorted(VikashSingh(A)))
assert(B == sorted(AbhilekhSingh(A)))
print("elena:", format(timeit.timeit('B = elena(A)', number=10, globals=globals()), ".3"))
print("cdlane:", format(timeit.timeit('B = cdlane(A)', number=10, globals=globals()), ".3"))
print("VikashSingh:", format(timeit.timeit('B = VikashSingh(A)', number=10, globals=globals()), ".3"))
print("AbhilekhSingh:", format(timeit.timeit('B = AbhilekhSingh(A)', number=10, globals=globals()), ".3"))
In [27]: A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'], ['7','8']]
In [28]: new_list = []
In [29]: for i in A:
...: if i not in new_list:
...: new_list.append(i)
...:
In [30]: new_list
Out[30]: [['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]
结果
elena: 17.5
cdlane: 2.04
VikashSingh: 10.0
AbhilekhSingh: 8.83
以下是一个简单的解决方案:
import timeit
from random import choice
DIGITS = list("123456789")
# one million elements in list
A = [[choice(DIGITS), choice(DIGITS)] for _ in range(1000000)]
def elena(A): # MrName's solution is identical
B = []
for i in A:
if i not in B:
B.append(i)
return B
def cdlane(A):
return list(map(list, set(map(tuple, A))))
def VikashSingh(A):
uniques = set()
B = []
for x in A:
val = '-'.join([str(key) for key in x])
if val not in uniques:
B.append(x)
uniques.add(val)
return B
def AbhilekhSingh(A):
def unique_elements(l):
last = object()
for item in l:
if item == last:
continue
yield item
last = item
return list(unique_elements(sorted(A)))
# sanity check to make sure everyone one agrees on the answer
B = sorted(elena(A))
assert(B == sorted(cdlane(A)))
assert(B == sorted(VikashSingh(A)))
assert(B == sorted(AbhilekhSingh(A)))
print("elena:", format(timeit.timeit('B = elena(A)', number=10, globals=globals()), ".3"))
print("cdlane:", format(timeit.timeit('B = cdlane(A)', number=10, globals=globals()), ".3"))
print("VikashSingh:", format(timeit.timeit('B = VikashSingh(A)', number=10, globals=globals()), ".3"))
print("AbhilekhSingh:", format(timeit.timeit('B = AbhilekhSingh(A)', number=10, globals=globals()), ".3"))
In [27]: A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'], ['7','8']]
In [28]: new_list = []
In [29]: for i in A:
...: if i not in new_list:
...: new_list.append(i)
...:
In [30]: new_list
Out[30]: [['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]
以下是一个简单的解决方案:
import timeit
from random import choice
DIGITS = list("123456789")
# one million elements in list
A = [[choice(DIGITS), choice(DIGITS)] for _ in range(1000000)]
def elena(A): # MrName's solution is identical
B = []
for i in A:
if i not in B:
B.append(i)
return B
def cdlane(A):
return list(map(list, set(map(tuple, A))))
def VikashSingh(A):
uniques = set()
B = []
for x in A:
val = '-'.join([str(key) for key in x])
if val not in uniques:
B.append(x)
uniques.add(val)
return B
def AbhilekhSingh(A):
def unique_elements(l):
last = object()
for item in l:
if item == last:
continue
yield item
last = item
return list(unique_elements(sorted(A)))
# sanity check to make sure everyone one agrees on the answer
B = sorted(elena(A))
assert(B == sorted(cdlane(A)))
assert(B == sorted(VikashSingh(A)))
assert(B == sorted(AbhilekhSingh(A)))
print("elena:", format(timeit.timeit('B = elena(A)', number=10, globals=globals()), ".3"))
print("cdlane:", format(timeit.timeit('B = cdlane(A)', number=10, globals=globals()), ".3"))
print("VikashSingh:", format(timeit.timeit('B = VikashSingh(A)', number=10, globals=globals()), ".3"))
print("AbhilekhSingh:", format(timeit.timeit('B = AbhilekhSingh(A)', number=10, globals=globals()), ".3"))
In [27]: A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'], ['7','8']]
In [28]: new_list = []
In [29]: for i in A:
...: if i not in new_list:
...: new_list.append(i)
...:
In [30]: new_list
Out[30]: [['1', '2'], ['3', '4'], ['5', '6'], ['7', '8']]
您可以对列表进行排序,并将每个元素与其前一个元素进行比较
List length: n
Element length: m
Complexity: Sorting(n * log(n) * m) + Comparison(n * m) = Total(n * log(n) * m)
试试这个:
def unique_elements(l):
last = object()
for item in l:
if item == last:
continue
yield item
last = item
def remove_duplicates(l):
return list(unique_elements(sorted(l)))
您可以对列表进行排序,并将每个元素与其前一个元素进行比较
List length: n
Element length: m
Complexity: Sorting(n * log(n) * m) + Comparison(n * m) = Total(n * log(n) * m)
试试这个:
def unique_elements(l):
last = object()
for item in l:
if item == last:
continue
yield item
last = item
def remove_duplicates(l):
return list(unique_elements(sorted(l)))
另一个可能简单的解决方案,但不确定“成本”与其他提出的解决方案相比如何:
A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]
res = []
for entry in A:
if not entry in res:
res.append(entry)
另一个可能简单的解决方案,但不确定“成本”与其他提出的解决方案相比如何:
A = [['1','2'],['3','4'],['5','6'],['1','2'],['3','4'],['7','8']]
res = []
for entry in A:
if not entry in res:
res.append(entry)
不能将列表放入集合中,因为列表不可散列。即使这样做有效,您也会丢失项目的顺序。@MosesKoledoye顺序并不重要,但成本很重要。此外,这些都是列表,不是。您不能将列表放入集合中,因为列表是不可散列的。即使这样做有效,您也会失去物品的订购。@MosesKoledoye订单并不重要,但成本很重要。此外,这些都是清单,不必再提运行成本。我认为它有o(n)。是吗?运行时间。请同时提及运行成本。我认为它有o(n)。是吗?O(n)运行时间。请同时提及运行成本。这将改变列表元素的顺序,这可能是个问题,也可能不是问题。运行成本是多少?@sphericalcowboy,OP在其帖子的评论中已经提到了这一点。我认为
log(n)
是不可能的。因为如果log(n)
是可能的,我们可以对它进行反向工程,并使用它对列表进行排序。我们知道这不可能是log(n)
。请同时提及运行成本。这将改变列表元素的顺序,这可能是个问题,也可能不是问题。运行成本是多少?@spherecalcowboy,OP在其帖子的评论中已经提到了这一点。我认为log(n)
是不可能的。因为如果log(n)
是可能的,我们可以对它进行反向工程,并使用它对列表进行排序。我们知道这不能是log(n)
。请同时提及运行成本。我认为它有o(n)。是吗?如果我不在新列表中:
这难道不是一个昂贵的步骤,并使流程n^2?请同时提及运行成本。我认为它有o(n)。是吗?如果我不在新列表中:
这难道不是一个昂贵的步骤,并使流程n^2?请同时提及运行成本。我认为它有o(n)。是吗?添加了运行成本,并让我知道您对此有任何疑问。您在100万列表中的解决方案给出了CPU时间:用户1.21秒,系统17.9毫秒,总计:1.23秒
,请同时提及运行成本。我认为它有o(n)。是吗?增加了运行成本,并让我知道您对此有任何疑问。您在100万列表中的解决方案提供了CPU时间:用户1.21秒,sys:17.9毫秒,总计:1.23秒
对于提供的输入数据,此解决方案非常充分,保留了顺序,并且非常简单。对于一个包含10000个列表,每个列表有1000个数字的列表来说,这将是非常昂贵的。对于提供的输入数据来说,这个解决方案是非常充分的,可以保持顺序,并且非常简单。对于一个10000个列表,每个1000个数字的列表来说,这将是昂贵的。