Python 熊猫-使用itertuples创建列_Python_List_Loops_Pandas_Itertools

Python 熊猫-使用itertuples创建列

python list loops pandas

Python 熊猫-使用itertuples创建列,python,list,loops,pandas,itertools,Python,List,Loops,Pandas,Itertools,我有一个pandas.DataFrame，带有AcctId、纬度和经度。我还有一个坐标表。我试图计算纬度和经度与列表中每个坐标对之间的距离（使用哈弗森公式）。然后我想返回最小距离，并在dataframe中使用该值创建一个新列但是，我的输出表只返回循环中最后一行的距离值。我尝试过使用itertuples，iterrows，以及普通循环，但没有一种方法对我有效 df AcctId Latitude Longitude 123 40.50 -90.13 123

我有一个

pandas.DataFrame

，带有

AcctId

、

纬度

和

经度

。我还有一个坐标表。我试图计算纬度和经度与列表中每个坐标对之间的距离（使用哈弗森公式）。然后我想返回最小距离，并在dataframe中使用该值创建一个新列

但是，我的输出表只返回循环中最后一行的距离值。我尝试过使用

itertuples

，

iterrows

，以及普通循环，但没有一种方法对我有效

df
AcctId   Latitude   Longitude
123      40.50      -90.13
123      40.53      -90.21
123      40.56      -90.45
123      40.63      -91.34

coords = [41.45,-95.13,39.53,-100.42,45.53,-95.32]

for row in df.itertuples():
    Latitude = row[1]
    Longitude = row[2]
    distances = []
    lat = []
    lng = []
    for i in xrange(0, len(coords),2):
          distances.append(haversine_formula(Latitude,coords[i],Longitude,coords[i+1])
          lat.append(coords[i])
          lng.append(coords[i+1])
          min_distance = min(distances)
    df['Output'] = min_distance

所需输出：

df
AcctId   Latitude    Longitude    Output
123      40.50      -90.13         23.21
123      40.53      -90.21         38.42
123      40.56      -90.45         41.49
123      40.63      -91.34         42.45

df
AcctId   Latitude    Longitude    Output
123      40.50      -90.13         42.45
123      40.53      -90.21         42.45
123      40.56      -90.45         42.45
123      40.63      -91.34         42.45

实际输出：

df
AcctId   Latitude    Longitude    Output
123      40.50      -90.13         23.21
123      40.53      -90.21         38.42
123      40.56      -90.45         41.49
123      40.63      -91.34         42.45

df
AcctId   Latitude    Longitude    Output
123      40.50      -90.13         42.45
123      40.53      -90.21         42.45
123      40.56      -90.45         42.45
123      40.63      -91.34         42.45

最终代码

for row in df.itertuples():
    def min_distance(row):
        here = (row.Latitude, row.Longitude)
        return min(haversine(here, coord) for coord in coords)
    df['Nearest_Distance'] = df.apply(min_distance, axis=1)

你在找我。比如：

代码：测试代码：结果：

那太棒了。我在这方面工作的时间太长了。谢谢你的帮助！此解决方案可行，但当应用于大型数据帧（>2MM行）时，

df.apply

运行非常缓慢。关于

df.apply

的替代方案有什么建议吗？我敢打赌，速度的主要不足在于哈弗森计算。您应该分析代码。如果我是对的，你可以考虑做一个距离的估计，只需要在一个子集上做哈弗斯线，什么是估计距离的好方法？请原谅我的无知——我对Python是相当陌生的。如果你在某个纬度以下，你只需做

sqrt（d_lat**2+d_long**2）

一次近似，就可以找到附近的东西。甚至更简单的方法是

min（d_lat，d_long）

找到靠近的候选者。将真正取决于数据的性质。

   AcctId  Latitude  Longitude      output
0     123     40.50     -90.13  432.775598
1     123     40.53     -90.21  425.363959
2     123     40.56     -90.45  404.934516
3     123     40.63     -91.34  330.649766