Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/312.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从Numpy Ndarray生成字典和列表_Python_Numpy_Dictionary - Fatal编程技术网

Python 从Numpy Ndarray生成字典和列表

Python 从Numpy Ndarray生成字典和列表,python,numpy,dictionary,Python,Numpy,Dictionary,我想用2D ndarray为数百万数据创建一个字典 正在寻找一种能够达到这一目的的蟒蛇式和性能化的方法 我的日期: my_dict = { 245: { 'origin_lat_lon': { 'lat': 32.45, 'lon': 63.89 }, 'dest_lat_lon': { 'lat': 72.1,

我想用2D ndarray为数百万数据创建一个字典

正在寻找一种能够达到这一目的的蟒蛇式和性能化的方法

我的日期:

my_dict = {
        245: {
            'origin_lat_lon': {
                'lat': 32.45,
                'lon': 63.89
            },
            'dest_lat_lon': {
                'lat': 72.1,
                'lon': 63.57
            },
            'distance': 123.45
        },
        246: {
            'origin_lat_lon': {
                'lat': 61.73,
                'lon': 42.71
            },
            'dest_lat_lon': {
                'lat': 75.54,
                'lon': -81.69
            },
            'distance': 16.32
        }
    }

my_list = [{'lat': 32.45, 'lon': 63.89},
 {'lat': 72.1, 'lon': 63.57},
 {'lat': 61.73, 'lon': 42.71},
 {'lat': 75.54, 'lon': -81.69}]
my_dict = dict()
my_list = list()

for arr in my_array:
    origin_lat_lon = {'lat': arr[1],
                            'lon': arr[2]}
    dest_lat_lon  = {'lat': arr[3],
                  'lon': arr[4]}
    value = {'origin_lat_lon':origin_lat_lon,'dest_lat_lon':dest_lat_lon,'distance':arr[5]}
    my_dict[int(arr[0])]=value
    my_list.append(origin_lat_lon)
    my_list.append(dest_lat_lon)
import numpy as np

my_array = np.array([[245, 32.45,63.89,72.1,63.57,123.45],[246, 61.73,42.71,75.54,-81.69,16.32]])
keys = ['origin_lat', 'origin_lon', 'dest_lat','dest_lon', 'distance']
keys_2 = ['lat', 'lon']

my_dict = {}
my_list = []

for arr in my_array:
    key, vals = arr[0], arr[1:]
    my_dict[int(key)] = dict(zip(keys, vals))
    my_list.extend([[dict(zip(keys_2, vals[0:2]))],[dict(zip(keys_2, vals[2:4]))]])

print(my_dict)
print(my_list)
{245: {'dest_lat': 72.1,
       'dest_lon': 63.57,
       'distance': 123.45,
       'origin_lat': 32.45,
       'origin_lon': 63.89},
 246: {'dest_lat': 75.54,
       'dest_lon': -81.69,
       'distance': 16.32,
       'origin_lat': 61.73,
       'origin_lon': 42.71}}
[[{'lat': 32.45, 'lon': 63.89}],
 [{'lat': 72.1, 'lon': 63.57}],
 [{'lat': 61.73, 'lon': 42.71}],
 [{'lat': 75.54, 'lon': -81.69}]]
格式:[id,原点纬度,原点纬度,目的地纬度,目的地纬度,距离]

my_array = np.array([[245, 32.45,63.89,72.1,63.57,123.45],
[246, 61.73,42.71,75.54,-81.69,16.32]])
预期输出:

my_dict = {
        245: {
            'origin_lat_lon': {
                'lat': 32.45,
                'lon': 63.89
            },
            'dest_lat_lon': {
                'lat': 72.1,
                'lon': 63.57
            },
            'distance': 123.45
        },
        246: {
            'origin_lat_lon': {
                'lat': 61.73,
                'lon': 42.71
            },
            'dest_lat_lon': {
                'lat': 75.54,
                'lon': -81.69
            },
            'distance': 16.32
        }
    }

my_list = [{'lat': 32.45, 'lon': 63.89},
 {'lat': 72.1, 'lon': 63.57},
 {'lat': 61.73, 'lon': 42.71},
 {'lat': 75.54, 'lon': -81.69}]
my_dict = dict()
my_list = list()

for arr in my_array:
    origin_lat_lon = {'lat': arr[1],
                            'lon': arr[2]}
    dest_lat_lon  = {'lat': arr[3],
                  'lon': arr[4]}
    value = {'origin_lat_lon':origin_lat_lon,'dest_lat_lon':dest_lat_lon,'distance':arr[5]}
    my_dict[int(arr[0])]=value
    my_list.append(origin_lat_lon)
    my_list.append(dest_lat_lon)
import numpy as np

my_array = np.array([[245, 32.45,63.89,72.1,63.57,123.45],[246, 61.73,42.71,75.54,-81.69,16.32]])
keys = ['origin_lat', 'origin_lon', 'dest_lat','dest_lon', 'distance']
keys_2 = ['lat', 'lon']

my_dict = {}
my_list = []

for arr in my_array:
    key, vals = arr[0], arr[1:]
    my_dict[int(key)] = dict(zip(keys, vals))
    my_list.extend([[dict(zip(keys_2, vals[0:2]))],[dict(zip(keys_2, vals[2:4]))]])

print(my_dict)
print(my_list)
{245: {'dest_lat': 72.1,
       'dest_lon': 63.57,
       'distance': 123.45,
       'origin_lat': 32.45,
       'origin_lon': 63.89},
 246: {'dest_lat': 75.54,
       'dest_lon': -81.69,
       'distance': 16.32,
       'origin_lat': 61.73,
       'origin_lon': 42.71}}
[[{'lat': 32.45, 'lon': 63.89}],
 [{'lat': 72.1, 'lon': 63.57}],
 [{'lat': 61.73, 'lon': 42.71}],
 [{'lat': 75.54, 'lon': -81.69}]]
我的代码:

my_dict = {
        245: {
            'origin_lat_lon': {
                'lat': 32.45,
                'lon': 63.89
            },
            'dest_lat_lon': {
                'lat': 72.1,
                'lon': 63.57
            },
            'distance': 123.45
        },
        246: {
            'origin_lat_lon': {
                'lat': 61.73,
                'lon': 42.71
            },
            'dest_lat_lon': {
                'lat': 75.54,
                'lon': -81.69
            },
            'distance': 16.32
        }
    }

my_list = [{'lat': 32.45, 'lon': 63.89},
 {'lat': 72.1, 'lon': 63.57},
 {'lat': 61.73, 'lon': 42.71},
 {'lat': 75.54, 'lon': -81.69}]
my_dict = dict()
my_list = list()

for arr in my_array:
    origin_lat_lon = {'lat': arr[1],
                            'lon': arr[2]}
    dest_lat_lon  = {'lat': arr[3],
                  'lon': arr[4]}
    value = {'origin_lat_lon':origin_lat_lon,'dest_lat_lon':dest_lat_lon,'distance':arr[5]}
    my_dict[int(arr[0])]=value
    my_list.append(origin_lat_lon)
    my_list.append(dest_lat_lon)
import numpy as np

my_array = np.array([[245, 32.45,63.89,72.1,63.57,123.45],[246, 61.73,42.71,75.54,-81.69,16.32]])
keys = ['origin_lat', 'origin_lon', 'dest_lat','dest_lon', 'distance']
keys_2 = ['lat', 'lon']

my_dict = {}
my_list = []

for arr in my_array:
    key, vals = arr[0], arr[1:]
    my_dict[int(key)] = dict(zip(keys, vals))
    my_list.extend([[dict(zip(keys_2, vals[0:2]))],[dict(zip(keys_2, vals[2:4]))]])

print(my_dict)
print(my_list)
{245: {'dest_lat': 72.1,
       'dest_lon': 63.57,
       'distance': 123.45,
       'origin_lat': 32.45,
       'origin_lon': 63.89},
 246: {'dest_lat': 75.54,
       'dest_lon': -81.69,
       'distance': 16.32,
       'origin_lat': 61.73,
       'origin_lon': 42.71}}
[[{'lat': 32.45, 'lon': 63.89}],
 [{'lat': 72.1, 'lon': 63.57}],
 [{'lat': 61.73, 'lon': 42.71}],
 [{'lat': 75.54, 'lon': -81.69}]]

这是一种将
dict
zip
slicing
结合使用的方法

Ex:

my_dict = {
        245: {
            'origin_lat_lon': {
                'lat': 32.45,
                'lon': 63.89
            },
            'dest_lat_lon': {
                'lat': 72.1,
                'lon': 63.57
            },
            'distance': 123.45
        },
        246: {
            'origin_lat_lon': {
                'lat': 61.73,
                'lon': 42.71
            },
            'dest_lat_lon': {
                'lat': 75.54,
                'lon': -81.69
            },
            'distance': 16.32
        }
    }

my_list = [{'lat': 32.45, 'lon': 63.89},
 {'lat': 72.1, 'lon': 63.57},
 {'lat': 61.73, 'lon': 42.71},
 {'lat': 75.54, 'lon': -81.69}]
my_dict = dict()
my_list = list()

for arr in my_array:
    origin_lat_lon = {'lat': arr[1],
                            'lon': arr[2]}
    dest_lat_lon  = {'lat': arr[3],
                  'lon': arr[4]}
    value = {'origin_lat_lon':origin_lat_lon,'dest_lat_lon':dest_lat_lon,'distance':arr[5]}
    my_dict[int(arr[0])]=value
    my_list.append(origin_lat_lon)
    my_list.append(dest_lat_lon)
import numpy as np

my_array = np.array([[245, 32.45,63.89,72.1,63.57,123.45],[246, 61.73,42.71,75.54,-81.69,16.32]])
keys = ['origin_lat', 'origin_lon', 'dest_lat','dest_lon', 'distance']
keys_2 = ['lat', 'lon']

my_dict = {}
my_list = []

for arr in my_array:
    key, vals = arr[0], arr[1:]
    my_dict[int(key)] = dict(zip(keys, vals))
    my_list.extend([[dict(zip(keys_2, vals[0:2]))],[dict(zip(keys_2, vals[2:4]))]])

print(my_dict)
print(my_list)
{245: {'dest_lat': 72.1,
       'dest_lon': 63.57,
       'distance': 123.45,
       'origin_lat': 32.45,
       'origin_lon': 63.89},
 246: {'dest_lat': 75.54,
       'dest_lon': -81.69,
       'distance': 16.32,
       'origin_lat': 61.73,
       'origin_lon': 42.71}}
[[{'lat': 32.45, 'lon': 63.89}],
 [{'lat': 72.1, 'lon': 63.57}],
 [{'lat': 61.73, 'lon': 42.71}],
 [{'lat': 75.54, 'lon': -81.69}]]
输出:

my_dict = {
        245: {
            'origin_lat_lon': {
                'lat': 32.45,
                'lon': 63.89
            },
            'dest_lat_lon': {
                'lat': 72.1,
                'lon': 63.57
            },
            'distance': 123.45
        },
        246: {
            'origin_lat_lon': {
                'lat': 61.73,
                'lon': 42.71
            },
            'dest_lat_lon': {
                'lat': 75.54,
                'lon': -81.69
            },
            'distance': 16.32
        }
    }

my_list = [{'lat': 32.45, 'lon': 63.89},
 {'lat': 72.1, 'lon': 63.57},
 {'lat': 61.73, 'lon': 42.71},
 {'lat': 75.54, 'lon': -81.69}]
my_dict = dict()
my_list = list()

for arr in my_array:
    origin_lat_lon = {'lat': arr[1],
                            'lon': arr[2]}
    dest_lat_lon  = {'lat': arr[3],
                  'lon': arr[4]}
    value = {'origin_lat_lon':origin_lat_lon,'dest_lat_lon':dest_lat_lon,'distance':arr[5]}
    my_dict[int(arr[0])]=value
    my_list.append(origin_lat_lon)
    my_list.append(dest_lat_lon)
import numpy as np

my_array = np.array([[245, 32.45,63.89,72.1,63.57,123.45],[246, 61.73,42.71,75.54,-81.69,16.32]])
keys = ['origin_lat', 'origin_lon', 'dest_lat','dest_lon', 'distance']
keys_2 = ['lat', 'lon']

my_dict = {}
my_list = []

for arr in my_array:
    key, vals = arr[0], arr[1:]
    my_dict[int(key)] = dict(zip(keys, vals))
    my_list.extend([[dict(zip(keys_2, vals[0:2]))],[dict(zip(keys_2, vals[2:4]))]])

print(my_dict)
print(my_list)
{245: {'dest_lat': 72.1,
       'dest_lon': 63.57,
       'distance': 123.45,
       'origin_lat': 32.45,
       'origin_lon': 63.89},
 246: {'dest_lat': 75.54,
       'dest_lon': -81.69,
       'distance': 16.32,
       'origin_lat': 61.73,
       'origin_lon': 42.71}}
[[{'lat': 32.45, 'lon': 63.89}],
 [{'lat': 72.1, 'lon': 63.57}],
 [{'lat': 61.73, 'lon': 42.71}],
 [{'lat': 75.54, 'lon': -81.69}]]

您的代码包装在函数中,时间:

In [220]: timeit foo(my_array)                                                  
7.14 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
将数组转换为列表可将时间缩短一半
tolist()
是将数组转换为嵌套列表的(相对)快速方法。在列表上迭代比在数组上迭代快:

In [221]: timeit foo(my_array.tolist())                                         
2.68 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Rakesh的版本有点慢(我还没有确定原因):

克里斯的熊猫版要慢一点
pandas
确实有一个很好的字典接口,但显然它并不快。它可能是纯Python,由于是通用的,所以速度变慢了

In [224]: timeit foo_pd(my_array)                                               
3.35 ms ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Python字典的工作效率很高,但仍然需要逐个键访问它们
numpy
没有自己的用于使用词典的编译代码

===

您的数组可以强制转换为结构化数组。这样,列被替换为按名称访问的字段。因此,它更像是字典,尽管对于创建
json
输出来说可能没有任何好处。(而且它不是速度工具)

[225]中的
:dt=np.dtype([('id',int),('origin_lat',float),('origin_lon',float),('
…:dest_lat',float),('dest_lon',float),('distance',float)])
在[226]中:导入numpy.lib.rec函数作为rf
在[228]中:sarr=rf.非结构化到结构化(my_数组,dt)
在[229]中:sarr
出[229]:
数组([(245,32.45,63.89,72.1,63.57,123.45),
(246, 61.73, 42.71, 75.54, -81.69,  16.32)],

数据类型=[('id','@Chris You是对的,刚刚更新了预期的输出对于数百万数据来说,常规python for loop的速度很慢。我在寻找类似numpy.vectorize的东西,或者任何你没有显式循环数组的东西
numpy
vectorize意味着使用编译的numpy方法,这主要是计算性的。你正在创建python objects-列表和字典,最终是一个
json
字符串。没有编译过的numpy代码。@min2bro注意,
numpy.vectorize
只是一个Python for循环,它是为了方便而不是性能提供的。pandas版本使用的是
。apply
基本上只是一个常规的旧循环。除了此方法创建dataframe等的所有额外开销其中一个解决方案确实加快了整个计算。就时间而言,我的解决方案与这两个解决方案都很接近。我的问题是优化代码,使其在一百万个数据点上运行。我的观点是,既然您正在创建Python字典,就没有办法我不知道其中一个编译器工具,
numba
cython
是否有帮助。