Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/328.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中将表单转换为长格式_Python_Pandas_Numpy_Scipy_Distance - Fatal编程技术网

在python中将表单转换为长格式

在python中将表单转换为长格式,python,pandas,numpy,scipy,distance,Python,Pandas,Numpy,Scipy,Distance,守则: import numpy as np import pandas as pd from scipy.spatial.distance import pdist, squareform ids = ['1', '2', '3'] points=[(0,0), (1,1), (3,3)] distances = pdist(np.array(points), metric='euclidean') print(distances) distance_matrix = squareform(

守则:

import numpy as np
import pandas as pd
from scipy.spatial.distance import pdist, squareform

ids = ['1', '2', '3']
points=[(0,0), (1,1), (3,3)]
distances = pdist(np.array(points), metric='euclidean')
print(distances)
distance_matrix = squareform(distances)
print(distance_matrix)
印刷品:

[1.41421356 4.24264069 2.82842712]
[[0.         1.41421356 4.24264069]
 [1.41421356 0.         2.82842712]
 [4.24264069 2.82842712 0.        ]]
果然

我想把它转换成一种长格式,以csv格式编写,如

id1,id2,distance
1,1,0
1,2,1.41421356
1,3,4.24264069
2,1,1.41421356
2,2,0
2,3,2.82842712

etc-我应该如何实现最大效率?使用熊猫是一个选项

使用
数据帧
构造函数时:

DataFrame
constructor,以及:


DataFrame
constructor用于:

DataFrame
constructor,以及:

我建议使用-

辅助函数-

import numpy as np
import functools

# https://stackoverflow.com/a/46135435/ by @unutbu
def indices_merged_arr_generic_using_cp(arr):
    """
    Based on cartesian_product
    http://stackoverflow.com/a/11146645/190597 (senderle)
    """
    shape = arr.shape
    arrays = [np.arange(s, dtype='int') for s in shape]
    broadcastable = np.ix_(*arrays)
    broadcasted = np.broadcast_arrays(*broadcastable)
    rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)+1
    out = np.empty(rows * cols, dtype=arr.dtype)
    start, end = 0, rows
    for a in broadcasted:
        out[start:end] = a.reshape(-1)
        start, end = end, end + rows
    out[start:] = arr.flatten()
    return out.reshape(cols, rows).T
用法-

In [169]: out = indices_merged_arr_generic_using_cp(distance_matrix)

In [170]: np.savetxt('out.txt', out, fmt="%i,%i,%f")

In [171]: !cat out.txt
0,0,0.000000
0,1,1.414214
0,2,4.242641
1,0,1.414214
1,1,0.000000
1,2,2.828427
2,0,4.242641
2,1,2.828427
2,2,0.000000
要获得距离矩阵,我们还可以使用Scipy的cdist:
cdist(点,点)
。还有一个软件包(免责声明:我是它的作者)包含了各种计算欧几里德距离的方法,这些方法比SciPy的cdist要有效得多,特别是对于大型阵列。

我建议使用-

辅助函数-

import numpy as np
import functools

# https://stackoverflow.com/a/46135435/ by @unutbu
def indices_merged_arr_generic_using_cp(arr):
    """
    Based on cartesian_product
    http://stackoverflow.com/a/11146645/190597 (senderle)
    """
    shape = arr.shape
    arrays = [np.arange(s, dtype='int') for s in shape]
    broadcastable = np.ix_(*arrays)
    broadcasted = np.broadcast_arrays(*broadcastable)
    rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)+1
    out = np.empty(rows * cols, dtype=arr.dtype)
    start, end = 0, rows
    for a in broadcasted:
        out[start:end] = a.reshape(-1)
        start, end = end, end + rows
    out[start:] = arr.flatten()
    return out.reshape(cols, rows).T
用法-

In [169]: out = indices_merged_arr_generic_using_cp(distance_matrix)

In [170]: np.savetxt('out.txt', out, fmt="%i,%i,%f")

In [171]: !cat out.txt
0,0,0.000000
0,1,1.414214
0,2,4.242641
1,0,1.414214
1,1,0.000000
1,2,2.828427
2,0,4.242641
2,1,2.828427
2,2,0.000000

要获得距离矩阵,我们还可以使用Scipy的cdist:
cdist(点,点)
。还有一个软件包(免责声明:我是它的作者)包含了各种计算欧几里德距离的方法,这些方法比SciPy的cdist更有效,尤其是对于大型数组。

谢谢-我仍然无法理解这些索引操作。你认为我能以某种方式避免正方形(一般认为是慢的)吗。矩阵有很多元素。@Mr_和Mrs_D-你认为解决方案是什么?会看一看-还有一个:-但是你的解决方案现在就可以了-我需要时间看看是否需要优化:)谢谢-那些索引操作我还是不知道。你认为我能以某种方式避免正方形(一般认为是慢的)吗。矩阵有很多元素。@Mr_和Mrs_D-你认为解决方案吗?会看一看-还有一个:-但你的解决方案现在就可以了-我需要时间看看是否需要优化:)谢谢你的建议-你认为会更快吗?@Mr_和Mrs_D认为你需要测试一下。同时,考虑在文章末尾的编辑中列出的备选方案。任何反馈?还没有,可能需要一段时间,我在一个项目的中期-将最终感谢这一建议-你认为它会更快吗?同时,考虑在文章末尾的编辑中列出的备选方案。任何反馈?还没有,可能需要一段时间,我在一个项目的中期-将最终