如何在python中比较两个排序的数字列表，其中每个对应的元素不'；你不需要精确匹配吗？_Python_Numpy

如何在python中比较两个排序的数字列表，其中每个对应的元素不'；你不需要精确匹配吗？

python numpy

如何在python中比较两个排序的数字列表，其中每个对应的元素不'；你不需要精确匹配吗？,python,numpy,Python,Numpy,给定2个排序编号列表，如下所示： >>> list1 = list(map(int, '7 22 34 49 56 62 76 82 89 161 174'.split())) >>> list2 = list(map(int, '7 14 49 57 66 76 135 142 161'.split())) in1d将产生7、49、76和161的真实值但我愿意接受一个特定的公差，即最多3，并为7、49、57、76和161生成真实值（因为列表1中的56和列

给定2个排序编号列表，如下所示：

>>> list1 = list(map(int, '7 22 34 49 56 62 76 82 89 161 174'.split()))
>>> list2 = list(map(int, '7 14 49 57 66 76 135 142 161'.split()))

in1d将产生7、49、76和161的真实值

但我愿意接受一个特定的公差，即最多3，并为7、49、57、76和161生成真实值（因为列表1中的56和列表2中的57仅相差1）
我编写了以下代码以生成所需的结果，其中FifoList（）是fifo堆栈的实现：

class FifoList: def __init__(self): self.data = [] def append(self, data): self.data.append(data) def pop(self): return self.data.pop(0) def match_approximate(a, b, approx=3): c = [] bEnd = False bfifo = FifoList() for i in b: bfifo.append(i) y = 0 for x in a: if bEnd: continue while True: if y == 0 or x - y > approx: try: y = bfifo.pop() except KeyError: bEnd = True break if abs(x - y) <= approx: c.append(y) break if y > x: break return c

第五类：定义初始化（自）： self.data=[] def附加（自身、数据）： self.data.append（数据） def pop（自我）：返回self.data.pop（0） def匹配近似值（a、b、近似值=3）： c=[] 弯曲=假 bfifo=FifoList（）对于b中的i：附录（一） y=0 对于a中的x：如果弯曲：持续尽管如此：如果y==0或x-y>近似值：尝试： y=bfifo.pop（）除KeyError外：弯曲=真打破如果abs（x-y）x：打破返回c

只是想知道是否有其他更好的方法来实现这一点？
我不知道您为什么在这里使用队列。它似乎不是解决这个问题的正确数据结构。此外，Python有一个内置的队列数据结构（
collections.deque
，只需使用
popleft（）
而不是
pop（0）
）
一种更简单的方法（imo）是从开始就维护每个数组的“指针”（或索引）。如果元素之间的距离在
近似值范围内，则添加它们，并增加两个指针。如果a 的元素小于b 的元素，则递增a s指针。否则，递增b s指针。继续，直到两个指针都用尽（即指向列表的末尾）。它以O（N）线性时间运行。下面是上述算法的一个实现： def match_approximate(a, b, approx=3): a_ind, b_ind = 0, 0 result = [] while a_ind < len(a) and b_ind < len(b): if abs(a[a_ind] - b[b_ind]) <= approx: result.append(b[b_ind]) if a[a_ind] == b[b_ind]: b_ind += 1 a_ind += 1 elif a[a_ind] < b[b_ind]: a_ind += 1 else: b_ind += 1 def match_last_element(a, a_ind, last_elt_of_b, result): while a_ind != len(a): if abs(a[a_ind] - last_elt_of_b) <= approx: result.append(a[a_ind]) a_ind += 1 else: break if a_ind != len(a): match_last_element(a, a_ind, b[-1], result) else: match_last_element(b, b_ind, a[-1], result) return result 将输出[5,6,7,49,57,76,161,163] （这是我预期的结果，但请参见下面一些不清楚的边缘情况）。如果在当前impl上运行此案例，您将得到一个索引器在某些边缘情况下，我们还不完全清楚该怎么做：当您有一个近似匹配时，您会将哪个元素添加到结果中？这个impl只是拉入b s元素（如示例所示）应如何处理重复项？在此impl中，如果两个列表都包含dup，则将添加两次。如果只有一个列表包含重复项，则仅添加一次元素。DUP的一个更复杂的例子是，如果我们有[4,5] 和[4,5] 作为输入，那么我们的输出应该是[4,5] 还是[4,4,5,5] （因为4 和5 都在彼此的近似值之内，4 也与5 匹配）近似值是包含的还是独占的（即我不知道为什么在这里使用队列。它似乎不是解决此问题的正确数据结构。此外，Python有一个内置的队列数据结构（collections.deque ，只需使用popleft（）而不是pop（0））一种更简单的方法（imo）是只维护一个“指针”（或索引）对于每个数组，从开始处开始。如果元素之间的距离大约，则添加它们，并递增两个指针。如果a 的元素小于b 的元素，则递增a 的指针。否则，递增b 的指针。继续操作，直到两个指针都用完为止（即指向列表的末尾）。该算法以O（N）线性时间运行。以下是所述算法的实现： def match_approximate(a, b, approx=3): a_ind, b_ind = 0, 0 result = [] while a_ind < len(a) and b_ind < len(b): if abs(a[a_ind] - b[b_ind]) <= approx: result.append(b[b_ind]) if a[a_ind] == b[b_ind]: b_ind += 1 a_ind += 1 elif a[a_ind] < b[b_ind]: a_ind += 1 else: b_ind += 1 def match_last_element(a, a_ind, last_elt_of_b, result): while a_ind != len(a): if abs(a[a_ind] - last_elt_of_b) <= approx: result.append(a[a_ind]) a_ind += 1 else: break if a_ind != len(a): match_last_element(a, a_ind, b[-1], result) else: match_last_element(b, b_ind, a[-1], result) return result 将输出[5,6,7,49,57,76,161,163] （这是我所期望的，但请参见下面关于一些不清楚的边缘案例的内容）。如果在当前impl上运行此案例，您将得到一个索引器在某些边缘情况下，我们还不完全清楚该怎么做：当您有一个近似匹配时，您会向结果中添加哪个元素？这个impl只会拉入b s元素（如示例所示）应该如何处理重复项？在这个例子中，如果两个列表都包含dup，它将被添加两次。如果只有一个列表包含重复项，元素将只被添加一次。dup的一个更复杂的例子是理解如果我们有[4,5] 和[4,5] 作为输入，我们的输出应该是[4,5] 或[4,4,5,5] （由于4 和5 都在近似值范围内，4 也与5 匹配）近似值绑定是包含的还是独占的（即从X: import numpy as np X = np.array([7,22,34,49,56,62,76,82,89,161,174]) #len:11 Y = np.array([7,14,49,57,66,76,135,142,161]) #len:9 dist = np.abs(Y[:, np.newaxis] - X) #print(dist) for i in range(len(Y)): for j in dist[i]: if -3<=j<=3: #approximation of 3 idx = dist[i].tolist().index(j) print(X[idx]) 从Y中选取近似匹配： import numpy as np X = np.array([7,22,34,49,56,62,76,82,89,161,174]) #len:11 Y = np.array([7,14,49,57,66,76,135,142,161]) #len:9 dist = np.abs(X[:, np.newaxis] - Y) #print(dist) for i in range(len(Y)+1): for j in dist[i]: if -3<=j<=3: #print(j) idx = dist[i].tolist().index(j) print(Y[idx]) 从X中选取近似匹配： import numpy as np X = np.array([7,22,34,49,56,62,76,82,89,161,174]) #len:11 Y = np.array([7,14,49,57,66,76,135,142,161]) #len:9 dist = np.abs(Y[:, np.newaxis] - X) #print(dist) for i in range(len(Y)): for j in dist[i]: if -3<=j<=3: #approximation of 3 idx = dist[i].tolist().index(j) print(X[idx]) 从Y中选取近似匹配： import numpy as np X = np.array([7,22,34,49,56,62,76,82,89,161,174]) #len:11 Y = np.array([7,14,49,57,66,76,135,142,161]) #len:9 dist = np.abs(X[:, np.newaxis] - Y) #print(dist) for i in range(len(Y)+1): for j in dist[i]: if -3<=j<=3: #print(j) idx = dist[i].tolist().index(j) print(Y[idx]) 多亏了@MattMessersmith，我实现了我的最终解决方案，满足了2个要求：筛选两个列表中彼此太近的项目在两个相互接近的列表中匹配项目 list1 = [7, 22, 34, 49, 56, 62, 76, 82, 89, 149, 161, 182] list2 = [7, 14, 49, 57, 66, 76, 135, 142, 161] >>> result = match_approximate(list1, list2, 3) >>> print result[0] >>> print result[1] [7, 49, 56, 76, 161] [7, 49, 57, 76, 161] >>> result = match_approximate(list1, list2, 1, True) >>> print result[0] >>> print result[1] [22, 34, 62, 82, 89, 149, 182] [14, 66, 135, 142] 代码如下： def match_approximate(a, b, approx, invert=False): a_ind, b_ind = 0, 0 resulta, resultb = [], [] while a_ind < len(a) and b_ind < len(b): aItem, bItem = a[a_ind], b[b_ind] if abs(aItem - bItem) <= approx: if not invert: resulta.append(aItem) resultb.append(bItem) a_ind += 1 b_ind += 1 continue if aItem < bItem: if invert: resulta.append(aItem) a_ind += 1 else: if invert: resultb.append(bItem) b_ind += 1 if invert: while a_ind != len(a): resulta.append(a[a_ind]) a_ind += 1 while b_ind != len(b): resulta.append(b[b_ind]) b_ind += 1 return [resulta, resultb] def match_近似值（a、b、近似值、反转=False）： a_ind，b_ind=0，0 结果A，结果B=[]，[] 当a_ind如果abs（aItem-bItem）多亏了@MattMessersmith，我已经实现了我的最终解决方案，它满足了两个要求：筛选两个列表中彼此太近的项目在两个相互接近的列表中匹配项目