Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/339.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中,有没有比for循环和if语句更快的方法来查找到另一个点的最近点?_Python_Performance_Geopy - Fatal编程技术网

在python中,有没有比for循环和if语句更快的方法来查找到另一个点的最近点?

在python中,有没有比for循环和if语句更快的方法来查找到另一个点的最近点?,python,performance,geopy,Python,Performance,Geopy,有没有一种更快的方法(在Python中,使用CPU)来完成与下面的函数相同的事情?我使用了进行循环和if语句,想知道是否有更快的方法?目前,每100个邮政编码运行此功能大约需要1分钟,而我大约需要70000个邮政编码 使用的两个数据帧是: postcode_df包含71092行和列: 邮政编码,例如“BL4 7PD” 纬度,例如53.577653 经度,例如-2.434136 e、 g air包含421行和列: 管参考,例如“ABC01” 纬度,例如53.55108 经度,例如-2.396

有没有一种更快的方法(在Python中,使用CPU)来完成与下面的函数相同的事情?我使用了
进行
循环和
if
语句,想知道是否有更快的方法?目前,每100个邮政编码运行此功能大约需要1分钟,而我大约需要70000个邮政编码

使用的两个数据帧是:

postcode_df
包含71092行和列:

  • 邮政编码,例如“BL4 7PD”
  • 纬度,例如53.577653
  • 经度,例如-2.434136
e、 g

air
包含421行和列:

  • 管参考,例如“ABC01”
  • 纬度,例如53.55108
  • 经度,例如-2.396236
e、 g

该函数循环使用postcode_df中的每个邮政编码,对于每个邮政编码,循环使用每个TubeRef并计算(使用
geopy
)它们之间的距离,并使用到邮政编码的最短距离保存TubeRef

输出df,
postcode\u nearest\u tube\u refs
,包含每个邮政编码的最近管,并包含以下列:

  • 邮政编码,例如“BL4 7PD”
  • 最近的空气管,例如“ABC01
  • 到空气管的距离KM,例如1.035848

可以使用numpy计算集合a中任意点到集合B中任意点的距离矩阵,然后只取集合a中对应于最小距离的点

import numpy as np
import pandas as pd

dfA = pd.DataFrame({'lat':np.random.uniform(0, 30, 3), 'lon':np.random.uniform(0, 30, 3), 'id':[1,2,3]})
dfB = pd.DataFrame({'lat':np.random.uniform(0, 30, 3), 'lon':np.random.uniform(0, 30, 3), 'id':['a', 'b', 'c']})
lat1 = dfA.lat.values.reshape(-1, 1)
lat2 = dfB.lat.values.reshape(1, -1)
lon1 = dfA.lon.values.reshape(-1, 1)
lon2 = dfB.lon.values.reshape(1, -1)
dists = np.sqrt((lat1 - lat2)**2 + (lon1-lon2)**2)
for id1, id2 in zip (dfB.id, dfA.id.iloc[np.argmin(dists, axis=1)]):
    print(f'the closest point in dfA to {id1} is {id2}')

这里有一个工作示例,需要几秒钟(使用示例输入和预期输出更新您的帖子geopandas软件包提供空间索引。包括地理编码和r树的使用。不要计算全距离矩阵,使用BallTree算法。它支持haversine距离,并且比全距离矩阵缩放得更好。我猜这需要秒/英里nutes。如果你需要一个完整的工作示例,请告诉我。请提供一些数据行friendly@user3184950 ()谢谢。我已经用Pandas代码更新了这个问题,用于创建带有一些示例行的输入数据帧。这是否为您提供了所需内容?如果能看到完整的工作示例将非常好。是的-这很有帮助。我发布了它,结果发现它不到10秒。只有dfA和dfB长度相同时,这才有效?我收到一个错误“索引器:位置索引器超出范围"使用包含比dfBit更多行的dfA尝试此方法不取决于A和B的长度,您可以使用不同长度的数据帧尝试我的解决方案。谢谢。我将尝试此方法,并将其标记为有效的答案。此方法非常有效,几乎是即时的。速度非常快!非常感谢!在这种情况下,机器速度非常快!很高兴它为马克工作了!
air = pd.DataFrame({"TubeRef":["Stkprt35", "Stkprt07", "Stkprt33"],
                                    "Latitude":[53.365085, 53.379502, 53.407510],
                                    "Longitude":[-2.0763, -2.120777, -2.145632]})
# define function to get nearest air quality monitoring tube per postcode
def get_nearest_tubes(constituency_list):
    
    postcodes = []
    nearest_tubes = []
    distances_to_tubes = []
    
    for postcode in postcode_df["Postcode"]:
            closest_tube = ""
            shortest_dist = 500

            postcode_lat = postcode_df.loc[postcode_df["Postcode"]==postcode, "Latitude"]
            postcode_long = postcode_df.loc[postcode_df["Postcode"]==postcode, "Longitude"]
            postcode_coord = (float(postcode_lat), float(postcode_long))


            for tuberef in air["TubeRef"]:
                tube_lat = air.loc[air["TubeRef"]==tuberef, "Latitude"]
                tube_long = air.loc[air["TubeRef"]==tuberef, "Longitude"]
                tube_coord = (float(tube_lat), float(tube_long))

                # calculate distance between postcode and tube
                dist_to_tube = geopy.distance.distance(postcode_coord, tube_coord).km
                if dist_to_tube < shortest_dist:
                    shortest_dist = dist_to_tube
                    closest_tube = str(tuberef)

            # save postcode's tuberef with shortest distance
            postcodes.append(str(postcode))
            nearest_tubes.append(str(closest_tube))
            distances_to_tubes.append(shortest_dist)
            
    # create dataframe of the postcodes, nearest tuberefs and distance
    postcode_nearest_tube_refs = pd.DataFrame({"Postcode":postcodes, 
                                          "Nearest Air Tube":nearest_tubes, 
                                          "Distance to Air Tube KM": distances_to_tubes})

    return postcode_nearest_tube_refs
import numpy as np
import pandas as pd
# !pip install geopy
import geopy.distance
import numpy as np
import pandas as pd

dfA = pd.DataFrame({'lat':np.random.uniform(0, 30, 3), 'lon':np.random.uniform(0, 30, 3), 'id':[1,2,3]})
dfB = pd.DataFrame({'lat':np.random.uniform(0, 30, 3), 'lon':np.random.uniform(0, 30, 3), 'id':['a', 'b', 'c']})
lat1 = dfA.lat.values.reshape(-1, 1)
lat2 = dfB.lat.values.reshape(1, -1)
lon1 = dfA.lon.values.reshape(-1, 1)
lon2 = dfB.lon.values.reshape(1, -1)
dists = np.sqrt((lat1 - lat2)**2 + (lon1-lon2)**2)
for id1, id2 in zip (dfB.id, dfA.id.iloc[np.argmin(dists, axis=1)]):
    print(f'the closest point in dfA to {id1} is {id2}')
import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree
import uuid
np_rand_post = 5 * np.random.random((72000,2))
np_rand_post = np_rand_post + np.array((53.577653, -2.434136))
postcode_df = pd.DataFrame( np_rand_post , columns=['lat', 'long'])
postcode_df['postcode'] = [uuid.uuid4().hex[:6] for _ in range(72000)]
postcode_df.head()
np_rand = 5 * np.random.random((500,2))
np_rand = np_rand + np.array((53.55108, -2.396236))
tube_df = pd.DataFrame( np_rand , columns=['lat', 'long'])
tube_df['ref'] = [uuid.uuid4().hex[:5] for _ in range(500)]
tube_df.head()
postcode_gps = postcode_df[["lat", "long"]].values
air_gps = tube_df[["lat", "long"]].values
postal_radians =  np.radians(postcode_gps)
air_radians = np.radians(air_gps)

tree = BallTree(air_radians, leaf_size=15, metric='haversine')
distance, index = tree.query(postal_radians, k=1)
earth_radius = 6371000
distance_in_meters = distance * earth_radius
distance_in_meters