Python 测量点和组之间的距离

Python 测量点和组之间的距离,python,pandas,scipy,euclidean-distance,Python,Pandas,Scipy,Euclidean Distance,我试图测量数据帧内点之间的距离。首先,我想测量一个子区域中点之间的距离,得到该组点的平均距离。然后我想测量子区域之间的距离(测量这两个向量之间的距离)。我了解如何进行测量部分(前者使用scipy.space.distance.euclidean,后者使用scipy.space.distance.cdist)。我遇到的问题是如何将函数应用于数据集。我认为我应该在函数中使用groupby.apply()和feed,但我在概念化上遇到了困难。数据帧如下所示: id, latitude, longitu

我试图测量数据帧内点之间的距离。首先,我想测量一个子区域中点之间的距离,得到该组点的平均距离。然后我想测量子区域之间的距离(测量这两个向量之间的距离)。我了解如何进行测量部分(前者使用
scipy.space.distance.euclidean
,后者使用
scipy.space.distance.cdist
)。我遇到的问题是如何将函数应用于数据集。我认为我应该在函数中使用groupby.apply()和feed,但我在概念化上遇到了困难。数据帧如下所示:

id, latitude, longitude, subregion, region
contacts = {}

for i, row in sc_walkbook.iterrows():
    if contacts.get(row['region'],0) == 0:
        contacts[row['region']] = {}
        contacts[row['region']][row['subregion']] = {}
        contacts[row['region']][row['subregion']]['coords'] = []
        contacts[row['region']][row['subregion']]['distances'] = []
    elif contacts[row['region']].get(row['subregion'],0) == 0:
        contacts[row['region']][row['subregion']] = {}
        contacts[row['region']][row['subregion']]['coords'] = []
        contacts[row['region']][row['subregion']]['distances'] = []
    else:
        pass
    contacts[row['region']][row['subregion']]['coords'].append([row['T_Latitude'],row['T_Longitude']])

for region in contacts.itervalues():
    for subregion in region.itervalues():
        for a, b in itertools.combinations(subregion['coords'], 2):
            subregion['distances'].append(euclidean(a, b))
目前我有:

import pandas as pd
import numpy as np
from scipy.spatial.distance import euclidean

df = pd.read_csv('targets.csv')
...
def calculate_distance(x,y):
    return x._get_numeric_data().apply(axis=0, func=euclidean[x,y]).mean()

df.groupby('subregion').apply(calculate_distance)

我知道这是不正确的,因为我想对所有行应用多个列。我的另一个想法是我使用了错误的数据结构。

我最终使用了不同的数据结构,结果如下所示:

id, latitude, longitude, subregion, region
contacts = {}

for i, row in sc_walkbook.iterrows():
    if contacts.get(row['region'],0) == 0:
        contacts[row['region']] = {}
        contacts[row['region']][row['subregion']] = {}
        contacts[row['region']][row['subregion']]['coords'] = []
        contacts[row['region']][row['subregion']]['distances'] = []
    elif contacts[row['region']].get(row['subregion'],0) == 0:
        contacts[row['region']][row['subregion']] = {}
        contacts[row['region']][row['subregion']]['coords'] = []
        contacts[row['region']][row['subregion']]['distances'] = []
    else:
        pass
    contacts[row['region']][row['subregion']]['coords'].append([row['T_Latitude'],row['T_Longitude']])

for region in contacts.itervalues():
    for subregion in region.itervalues():
        for a, b in itertools.combinations(subregion['coords'], 2):
            subregion['distances'].append(euclidean(a, b))