Python 计算大熊猫地理密度的有效方法？_Python_Performance_Loops_Pandas_Geolocation

Python 计算大熊猫地理密度的有效方法？

python performance loops pandas geolocation

Python 计算大熊猫地理密度的有效方法？,python,performance,loops,pandas,geolocation,Python,Performance,Loops,Pandas,Geolocation,我有一个对应于美国快餐店的大量经度和纬度数据列表。对于每个快餐店，我想知道5英里内还有多少其他快餐店。我可以用Geopy（数据框中的每一行都是不同的快餐店）计算熊猫的这一点：将熊猫作为pd导入导入地理距离 df=pd.DataFrame（{'fastfoodplace'：[1,2,3]，'Lat'：[33,34,35]，'Lon'：[42,43,44]}）对于index1，df.iterrows（）中的第1行：快餐数量=0 对于index2，df.iterrows（）中的第2行： #计算

我有一个对应于美国快餐店的大量经度和纬度数据列表。对于每个快餐店，我想知道5英里内还有多少其他快餐店。我可以用Geopy（数据框中的每一行都是不同的快餐店）计算熊猫的这一点：

将熊猫作为pd导入
导入地理距离
df=pd.DataFrame（{'fastfoodplace'：[1,2,3]，'Lat'：[33,34,35]，'Lon'：[42,43,44]}）
对于index1，df.iterrows（）中的第1行：
快餐数量=0
对于index2，df.iterrows（）中的第2行：
#计算经纬度之间的距离（以英里为单位）
距离=地理位置、距离、距离（第1行[['Lat'，'Lon']]，
第2排[['Lat'，'Lon']]）。英里
#如果快餐在5英里以内，则增加num_快餐
如果距离小于5：#如果小于5英里
num_快餐=num_快餐+1
df.loc[index1，'num_fastfood_5miles']=num_fastfood-1（减去1以排除self）

但在非常大的数据集（即50000行）上，速度非常慢。我考虑过使用KDTree进行搜索，但很好奇其他人是否有更快的方法？

使用

scipy.spatial.cKDTree实现：
from scipy.spatial import cKDTree

def find_neighbours_within_radius(xy, radius):
    tree = cKDTree(xy)
    within_radius = tree.query_ball_tree(tree, r=radius)
    return within_radius

def flatten_nested_list(nested_list):
    return [item for sublist in nested_list for item in sublist]

def total_neighbours_within_radius(xy, radius):
    neighbours = find_neighbours_within_radius(xy, radius)
    return len(flatten_nested_list(neighbours))

在这项任务中很难击败KDTrees。有什么特别的理由不使用它吗？@Paul没有特别的原因-更多的好奇。我需要很快的时间来记住如何使用sklearn的KDTree设置。类似于tree=KDTree（my_lat_long）#查询所有nnDist值，nnIdx=tree.Query（my_lat_long）
然后循环nnDist？不，使用Query_ball_tree
获取半径内的所有点：tree=KDTree（my_lat_long）；在树内。查询球树（树，半径=5）
。然后将嵌套列表展平并计数。@Paul没有意识到它的存在，谢谢。如果我这样做的话，它只关注lat long，但我需要结合geopy来获得距离它的“真实”距离（以英里为单位），只需预先计算5英里等于多少度，然后使用lat/long。我得到的错误是TypeError:query\u ball\u tree（）至少需要2个位置参数（1个给定）
和这个更新：Nevermind，我知道我必须分别为它提供纬度/经度距离。对不起，关键字是r
，而不是cKDTree.query\u ball\u tree。修正了密码。
from scipy.spatial import cKDTree

def find_neighbours_within_radius(xy, radius):
    tree = cKDTree(xy)
    within_radius = tree.query_ball_tree(tree, r=radius)
    return within_radius

def flatten_nested_list(nested_list):
    return [item for sublist in nested_list for item in sublist]

def total_neighbours_within_radius(xy, radius):
    neighbours = find_neighbours_within_radius(xy, radius)
    return len(flatten_nested_list(neighbours))