从要比较的两个字典之一Python的键创建一个新字典_Python_Dictionary

从要比较的两个字典之一Python的键创建一个新字典

python dictionary

从要比较的两个字典之一Python的键创建一个新字典,python,dictionary,Python,Dictionary,我有两本带坐标的字典： vertex_coordinates = {0: [x0,y0,z0], 1: [x1,y1,z1], 2: [x2,y2,z2] ...} element_coordinates = {0: [X0,Y0,Z0], 2: [X2,Y2,Z2], 7: [X3,Y3,Z3] ...} 第一个字典的键只是0:N，而第二个字典的键是排序的，但不一定是后续的。第二本词典实际上比第一本词典大得多，所以有一个特殊的情况是 len(vertex_coordinates) = 729

我有两本带坐标的字典：

vertex_coordinates = {0: [x0,y0,z0], 1: [x1,y1,z1], 2: [x2,y2,z2] ...}
element_coordinates = {0: [X0,Y0,Z0], 2: [X2,Y2,Z2], 7: [X3,Y3,Z3] ...}

第一个字典的键只是0:N，而第二个字典的键是排序的，但不一定是后续的。第二本词典实际上比第一本词典大得多，所以有一个特殊的情况是

len(vertex_coordinates) = 729
len(element_coordinates) = 58752

我想要的是一个字典，其中键表示第一个字典的键，与此键关联的值是来自第二个字典的键列表，以便坐标相等。例如，让

vertex_coordinates = {0: [1.0,1.0,1.0], 1: [0.0,0.0,0.0], 2: [3.0,4.0,5.0], 3: [3.0, 6.0, 7.0]}
element_coordinates = {0: [0.0,0.0,0.0], 1: [3.0,4.0,5.0], 3: [3.0,6.0,7.0], \
   4: [1.0,1.0,1.0], 6: [0.0,0.0,0.0], 7: [3.0,4.0,5.0], 8:[1.0,1.0,1.0] \
   10: [3.0,6.0,7.0]}

然后，需要的字典是

element_to_vertex = {0: [4,8], 1: [0,6], 2: [1,7], 3: [3,10]}

这可能很重要，也可能不重要，但我的数据结构是这样的：在这个过程结束时，字典2中没有任何键，它们都将最终生成字典，即dict2的值集等于dict1的值集

我实施它的方式是：

for vertex in vertex_coordinates:
  temp = []
  for elem in element_coordinates:
    if(near(element_coordinates[elem][0], vertex_coordinates[vertex][0])):
      if(near(element_coordinates[elem][1], vertex_coordinates[vertex][1])):
        if(near(element_coordinates[elem][2], vertex_coordinates[vertex][2])):
          temp.append(elem)

  element_to_vertex[vertex] = temp

虽然这很好，但速度非常慢：在示例中，字典的长度为729和58752，运行大约需要25秒，而这些长度并不是我感兴趣的最大长度。你能告诉我是否有可能加快速度，或者我应该想其他方法来解决这个问题吗？

谢谢。

我没有你的数据，所以我不能自己测试性能，但是一个大的邪恶列表呢？像这样的

element_to_vertex = {}
for vertex in vertex_coordinates:
    temp = []
    element_to_vertex[vertex] = [elem for elem in element_coordinates if(near(element_coordinates[elem][0], vertex_coordinates[vertex][0])) and if(near(element_coordinates[elem][1], vertex_coordinates[vertex][1])) and if(near(element_coordinates[elem][2], vertex_coordinates[vertex][2]))]

您可能不会注意到速度的巨大提高，但可能会有一些，因为它不必每次都查找

append（）

方法。为了获得更好的性能，考虑到C.

，现在您在<代码> VisteX坐标< /代码>中的每个条目中重复<代码>元素>坐标<代码>。正如你所看到的，这是相当缓慢的

为什么不制作一个与元素坐标相反的新字典呢？：

{（1.0,1.0,1.0）：[4,8]，…}

。这样，您只需对它迭代一次，然后进行快速查找

这里有一个陷阱（谢谢@lukasgraf）。浮动并不总是正确比较，这可能不起作用。如果计算坐标，则可能存在舍入误差，并且查找将无法按预期工作。这就是你在问题中使用

near

方法的原因。你可以寻找一个潜在的解决方案。如果数据相对干净或已设置，则应该没有问题

这样做，您将只在每个字典上迭代一次。它不是

O（n^2）

而是

O（n）

。这种方法使用更多内存，但您必须选择其中一种

您可以这样做：

from collections import defaultdict
vertex_coordinates = {0: [1.0,1.0,1.0], 1: [0.0,0.0,0.0], 2: [3.0,4.0,5.0], 3: [3.0, 6.0, 7.0]}
element_coordinates = {0: [0.0,0.0,0.0], 1: [3.0,4.0,5.0], 3: [3.0,6.0,7.0], 4: [1.0,1.0,1.0], 6: [0.0,0.0,0.0], 7: [3.0,4.0,5.0], 8:[1.0,1.0,1.0], 10: [3.0,6.0,7.0]}

inv_el_coords = defaultdict(list)

for k, v in element_coordinates.items():
    inv_el_coords[tuple(v)].append(k)

element_to_vertex = {k:inv_el_coords[tuple(v)] for k,v in vertex_coordinates.items()}

print(element_to_vertex)

另一方面，如果最初可以将数据存储在元组中，这将有助于提高速度，因为不需要将它们转换为元组。从我所看到的，这应该不是一个问题，因为值列表总是有3个项目长。如果必须在一个元组中更改一个值，只需替换整个元组。

您可能希望重新考虑如何存储数据。可以使用numpy数组存储顶点坐标，使用scipy稀疏矩阵存储元素坐标。您可以保持空间效率，但也可以获得有效的方法来操作数据

from scipy.sparse import coo_matrix
from itertools import chain
import numpy as np

# input as specified
vertex_coordinates = {0: [1.0,1.0,1.0], 1: [0.0,0.0,0.0], 2: [3.0,4.0,5.0], 3: [3.0, 6.0, 7.0]}
element_coordinates = {0: [0.0,0.0,0.00000001], 1: [3.0,4.0,5.0], 3: [3.0,6.0,7.0], \
   4: [1.0,1.0,1.0], 6: [0.0,0.0,0.0], 7: [3.0,4.0,5.0], 8:[1.0,1.0,1.0], \
   10: [3.0,6.0,7.0]}

# conversion to numpy array and sparse array
vertex_coordinates = np.array(list(vertex_coordinates.values()), dtype=float)
rows = list(chain.from_iterable([i] * 3 for i in element_coordinates))
cols = list(range(3)) * len(element_coordinates)
data = list(chain.from_iterable(element_coordinates.values()))
element_coordinates = coo_matrix((data, (rows, cols)))
del rows, cols, data

# create output
num_cols = vertex_coordinates.shape[1] # 3
num_rows = len(element_coordinates.row) // num_cols # 8 in this case
shape = num_rows, num_cols

element_to_vertex = {}
# data and row are flat arrays, reshape array to have 3 columns
data_view = element_coordinates.data.reshape(shape)
row_indices = element_coordinates.row[::num_cols]
for i, row in enumerate(vertex_coordinates):
    # compare each row in element_coordinates to see if there is any match
    matches = np.isclose(row, data_view)
    # keep only the rows that completely matched
    row_matches = matches.all(axis=1)
    if row_matches.any():
        # if at least one row matched then get their indices 
        indices = row_indices[row_matches]
        element_to_vertex[i] = indices.tolist()

print(element_to_vertex)
# prints {0: [4, 8], 1: [0, 6], 2: [1, 7], 3: [3, 10]}

这应该会加快程序的速度，但在无法了解数据的完整结构的情况下，我可能做出了不一定正确的假设。

dof从何而来？我的错，编辑了它。它应该是“elem”，而查找表通常是解决此类问题的理想方法，但这种解决方案存在一个问题：OP处理的是浮点数（三维空间中的坐标）。不幸的是，由于表示问题和舍入错误，浮动并不总是与相等进行比较，即使它们在所有意图和目的上都是相等的。这就是为什么它们通常是-我假设OP的

near（）

函数就是这样做的。因此，除非有某种保证，保证两个集合中的坐标不是某个算术运算的结果，否则

inv\u el\u coords[tuple（v）]

查找可能会失败。例如，

hash（0.2+0.1）！=散列（0.3）

您是正确的。我不应该假设所有的数字看起来都那么圆润完美。