Arrays 使用python中的字典列表合并2个不同的数组
我正在尝试合并2个数组,如下所示: 第一:Arrays 使用python中的字典列表合并2个不同的数组,arrays,python-3.x,dictionary,Arrays,Python 3.x,Dictionary,我正在尝试合并2个数组,如下所示: 第一: [650001.88, 300442.2, 18.73, 0.575, 650002.094, 300441.668, 18.775] [650001.96, 300443.4, 18.7, 0.65, 650002.571, 300443.182, 18.745] [650002.95, 300442.54, 18.82, 0.473, 650003.056, 300442.085, 18.745] [650005.28,
[650001.88, 300442.2, 18.73, 0.575, 650002.094, 300441.668, 18.775]
[650001.96, 300443.4, 18.7, 0.65, 650002.571, 300443.182, 18.745]
[650002.95, 300442.54, 18.82, 0.473, 650003.056, 300442.085, 18.745]
[650005.28, 300444.76, 18.93, 0.463, 650005.368, 300444.395, 18.659]
[650006.17, 312903.26, 14.68, 0.442, 650006.146, 312902.819, 14.68]
[650006.18, 312902.89, 14.91, 0.243, 650006.146, 312902.819, 14.68]
[650006.17, 300445.16, 18.75, 0.402, 650006.286, 300444.792, 18.635]
[650006.8, 312904.65, 14.54, 0.479, 650006.904, 312905.096, 14.68]
[650006.78, 312905.06, 14.81, 0.184, 650006.904, 312905.096, 14.68]
[650011.84, 300447.74, 18.56, 0.546, 650011.836, 300447.197, 18.507]
[650012.96, 300446.92, 18.71, 0.553, 650013.238, 300446.497, 18.488]
[650014.07, 300447.51, 18.41, 0.614, 650014.2, 300446.914, 18.473]
[650001.18, 312862.23, 8.79, 40.338, 650014.526, 312899.965, 13.797]
[650001.19, 312861.88, 9.15, 40.619, 650014.526, 312899.965, 13.797]
证券交易委员会:
第一个数组的第七列与第二个数组的第一列共享相同的参数。我的第一个数组包含近5000万条记录,第二个数组包含50000条记录。我正在尝试基于共享列合并两个数组
我的最终数组应该是这样的
715316 650001.88 300442.2 18.73 0.575 650002.094 300441.668 18.775 1
715317 650001.96 300443.4 18.7 0.65 650002.571 300443.182 18.745 2
715310 650002.95 300442.54 18.82 0.473 650003.056 300442.085 18.745 3
715304 650005.28 300444.76 18.93 0.463 650005.368 300444.395 18.659 4
129733 650006.17 312903.26 14.68 0.442 650006.146 312902.819 14.68 5
129739 650006.18 312902.89 14.91 0.243 650006.146 312902.819 14.68 5
715303 650006.17 300445.16 18.75 0.402 650006.286 300444.792 18.635 6
129851 650006.8 312904.65 14.54 0.479 650006.904 312905.096 14.68 7
129852 650006.78 312905.06 14.81 0.184 650006.904 312905.096 14.68 7
715302 650011.84 300447.74 18.56 0.546 650011.836 300447.197 18.507 8
715301 650012.96 300446.92 18.71 0.553 650013.238 300446.497 18.488 9
715250 650014.07 300447.51 18.41 0.614 650014.2 300446.914 18.473 10
129121 650001.18 312862.23 8.79 40.338 650014.526 312899.965 13.797 11
129127 650001.19 312861.88 9.15 40.619 650014.526 312899.965 13.797 11
129128 650001.19 312861.54 9.53 40.897 650014.526 312899.965 13.797 11
我设法做到了这一点,但现在唯一的问题是我的d1字典覆盖了重复的键,结果输出不正确
def merge_arrays(first, sec):
d1 = dict((x[5], x[0:]) for x in first)
d2 = dict((x[0], x[1:]) for x in sec)
finaldict = {key:(d2[key], d1[key]) for key in d2}
arr2 = []
for x in finaldict.values():
arr2.append(x)
#print(x)
arr = np.asarray(arr2)
a = np.array(arr)
output = np.array(list(map(np.concatenate,a)))
我想我需要使用字典列表,而不仅仅是普通字典。但我不知道如何将数组转换为具有重复键的字典列表
编辑:
我尝试使用@zipa方法:
d2 = dict((x[0], x[1:]) for x in sec)
finaldict = [item + d2[item[5]] for item in first]
print(finaldict[0])
[650001.88, 300442.2, 18.73, 0.575, 650002.094, 300441.668, 18.775]
我猜,我的字典之所以没有增值,是因为它的创建方式。当我选中d2[项目[4]]时,它创建了我[1],而不仅仅是1。我之所以访问项[4],是因为在我的数据中,它与示例中的项[5]具有相同的值
当我访问时,它会创建这个
但仍然无法为我的合并阵列增加价值。理解可以做到:
first = [[650001.88, 300442.2, 18.73, 0.575, 650002.094, 300441.668, 18.775],
[650001.96, 300443.4, 18.7, 0.65, 650002.571, 300443.182, 18.745],
[650002.95, 300442.54, 18.82, 0.473, 650003.056, 300442.085, 18.745],
[650005.28, 300444.76, 18.93, 0.463, 650005.368, 300444.395, 18.659],
[650006.17, 312903.26, 14.68, 0.442, 650006.146, 312902.819, 14.68],
[650006.18, 312902.89, 14.91, 0.243, 650006.146, 312902.819, 14.68],
[650006.17, 300445.16, 18.75, 0.402, 650006.286, 300444.792, 18.635],
[650006.8, 312904.65, 14.54, 0.479, 650006.904, 312905.096, 14.68],
[650006.78, 312905.06, 14.81, 0.184, 650006.904, 312905.096, 14.68],
[650011.84, 300447.74, 18.56, 0.546, 650011.836, 300447.197, 18.507],
[650012.96, 300446.92, 18.71, 0.553, 650013.238, 300446.497, 18.488],
[650014.07, 300447.51, 18.41, 0.614, 650014.2, 300446.914, 18.473],
[650001.18, 312862.23, 8.79, 40.338, 650014.526, 312899.965, 13.797],
[650001.19, 312861.88, 9.15, 40.619, 650014.526, 312899.965, 13.797]]
second = [[300441.668, 1],
[300443.182, 2],
[300442.085, 3],
[300444.395, 4],
[312902.819, 5],
[300444.792, 6],
[312905.096, 7],
[300447.197, 8],
[300446.497, 9],
[300446.914, 10],
[312899.965, 11]]
second_dict = {i[0]: i[1] for i in second}
first_second = [item + [second_dict[item[5]]] for item in first]
print first_second[0]
您只需将第二个数组转换为dict:
second_list = [[300441.668, 1],
[300443.182, 2],
[300442.085, 3],
[300444.395, 4],
[312902.819, 5],
[300444.792, 6],
[312905.096, 7],
[300447.197, 8],
[300446.497, 9],
[300446.914, 10],
[312899.965, 11]]
print(dict(second_list))
# {312899.965: 11, 300447.197: 8, 300443.182: 2, 300444.792: 6, 300441.668: 1, 300444.395: 4, 300446.497: 9, 312905.096: 7, 312902.819: 5, 300442.085: 3, 300446.914: 10}
它为您提供了第一个数组的快速查找表。不需要将第一个数组转换为任何其他数组。如果找不到键,您可能希望与默认值一起使用。您能解释一下为什么从第二个目录中获取项[5]而不仅仅是值吗?我获取的是与
项[5]
匹配的键的值。出于某种原因,我没有向新列表添加值。当我打印第一秒[0]时,我只得到了7条记录,而不是8条。所以我猜并不是将新记录添加到最终输出中。但我的另一方面是,从字典中正确查找键并没有添加到新列表中。它应该是8,在我的机器上它是8。我以前从未将字典与数组进行过比较。您能解释一下如何使用它吗?
second_list = [[300441.668, 1],
[300443.182, 2],
[300442.085, 3],
[300444.395, 4],
[312902.819, 5],
[300444.792, 6],
[312905.096, 7],
[300447.197, 8],
[300446.497, 9],
[300446.914, 10],
[312899.965, 11]]
print(dict(second_list))
# {312899.965: 11, 300447.197: 8, 300443.182: 2, 300444.792: 6, 300441.668: 1, 300444.395: 4, 300446.497: 9, 312905.096: 7, 312902.819: 5, 300442.085: 3, 300446.914: 10}