Python 带距离条件的最近邻连接
在这个问题上,我指的是这个项目: 我们有两个GeodataFrame: 建筑物:Python 带距离条件的最近邻连接,python,pandas,scikit-learn,geopandas,Python,Pandas,Scikit Learn,Geopandas,在这个问题上,我指的是这个项目: 我们有两个GeodataFrame: 建筑物: name geometry 0 None POINT (24.85584 60.20727) 1 Uimastadion POINT (24.93045 60.18882) 2 None POINT (24.95113 60.16994) 3 Hartwall Arena POINT (24
name geometry
0 None POINT (24.85584 60.20727)
1 Uimastadion POINT (24.93045 60.18882)
2 None POINT (24.95113 60.16994)
3 Hartwall Arena POINT (24.92918 60.20570)
及巴士站:
stop_name stop_lat stop_lon stop_id geometry
0 Ritarihuone 60.169460 24.956670 1010102 POINT (24.95667 60.16946)
1 Kirkkokatu 60.171270 24.956570 1010103 POINT (24.95657 60.17127)
2 Kirkkokatu 60.170293 24.956721 1010104 POINT (24.95672 60.17029)
3 Vironkatu 60.172580 24.956554 1010105 POINT (24.95655 60.17258)
申请后
sklearn.com导入BallTree
我们得到每个建筑指数到最近公共汽车站的距离:
stop_name stop_lat stop_lon stop_id geometry distance
0 Muusantori 60.207490 24.857450 1304138 POINT (24.85745 60.20749) 180.521584
1 Eläintarha 60.192490 24.930840 1171120 POINT (24.93084 60.19249) 372.665221
2 Senaatintori 60.169010 24.950460 1020450 POINT (24.95046 60.16901) 119.425777
3 Veturitie 60.206610 24.929680 1174112 POINT (24.92968 60.20661) 106.762619
我正在寻找解决方案,使每栋建筑的每一个公交车站(可能不止一个)距离低于250米
谢谢您的帮助。要在250米内到达最近的公共汽车站:
按距离过滤的距离=最近的距离[最近的距离.距离<250]
结果=建筑物。连接(按距离过滤)
对于需要使用的半径范围内的所有挡块。但是您需要将米转换为弧度。这里有一种方法,可以重复使用BallTree所做的工作,就像前面提到的那样,但是使用
query\u radius
。此外,它不是函数格式,但您仍然可以轻松更改它
from sklearn.neighbors import BallTree
import numpy as np
import pandas as pd
## here I start with buildings and stops as loaded in the link provided
# variable in meter you can change
radius_max = 250 # meters
# another parameter, in case you want to do with Mars radius ^^
earth_radius = 6371000 # meters
# similar to the method with apply in the tutorial
# to create left_radians and right_radians, but faster
candidates = np.vstack([stops['geometry'].x.to_numpy(),
stops['geometry'].y.to_numpy()]).T*np.pi/180
src_points = np.vstack([buildings['geometry'].x.to_numpy(),
buildings['geometry'].y.to_numpy()]).T*np.pi/180
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# use query_radius instead
ind_radius, dist_radius = tree.query_radius(src_points,
r=radius_max/earth_radius,
return_distance=True)
现在,您可以操纵结果以获得所需内容
# create a dataframe build with
# index based on row position of the building in buildings
# column row_stop is the row position of the stop
# dist is the distance
closest_dist = pd.concat([pd.Series(ind_radius).explode().rename('row_stop'),
pd.Series(dist_radius).explode().rename('dist')*earth_radius],
axis=1)
print (closest_dist.head())
# row_stop dist
#0 1131 180.522
#1 NaN NaN
#2 64 174.744
#2 61 119.426
#3 532 106.763
# merge the dataframe created above with the original data stops
# to get names, id, ... note: the index must be reset as in closest_dist
# it is position based
closest_stop = closest_dist.merge(stops.reset_index(drop=True),
left_on='row_stop', right_index=True, how='left')
print (closest_stop.head())
# row_stop dist stop_name stop_lat stop_lon stop_id \
#0 1131 180.522 Muusantori 60.20749 24.85745 1304138.0
#1 NaN NaN NaN NaN NaN NaN
#2 64 174.744 Senaatintori 60.16896 24.94983 1020455.0
#2 61 119.426 Senaatintori 60.16901 24.95046 1020450.0
#3 532 106.763 Veturitie 60.20661 24.92968 1174112.0
#
# geometry
#0 POINT (24.85745 60.20749)
#1 None
#2 POINT (24.94983 60.16896)
#2 POINT (24.95046 60.16901)
#3 POINT (24.92968 60.20661)
最后连接回建筑物
# join buildings with reset_index with
# closest_stop as index in closest_stop are position based
final_df = buildings.reset_index(drop=True).join(closest_stop, rsuffix='_stop')
print (final_df.head(10))
# name geometry row_stop dist stop_name \
# 0 None POINT (24.85584 60.20727) 1131 180.522 Muusantori
# 1 Uimastadion POINT (24.93045 60.18882) NaN NaN NaN
# 2 None POINT (24.95113 60.16994) 64 174.744 Senaatintori
# 2 None POINT (24.95113 60.16994) 61 119.426 Senaatintori
# 3 Hartwall Arena POINT (24.92918 60.20570) 532 106.763 Veturitie
# stop_lat stop_lon stop_id geometry_stop
# 0 60.20749 24.85745 1304138.0 POINT (24.85745 60.20749)
# 1 NaN NaN NaN None
# 2 60.16896 24.94983 1020455.0 POINT (24.94983 60.16896)
# 2 60.16901 24.95046 1020450.0 POINT (24.95046 60.16901)
# 3 60.20661 24.92968 1174112.0 POINT (24.92968 60.20661)
所以,我有一个奇怪的问题——计算的距离是错误的。我交换了geopandas数据帧中的纬度和经度字段,距离更好,但它们仍然相差半米到一米(这仍然是不正确的)。以前有人用balltree遇到过这个问题吗?谢谢,我会在balltree上阅读。query\u radius,第一个解决方案不会像我希望的那样在每个公交车站(可能不止一个)都起作用。这种问题非常适合。@s.k是和否,因为GIS工具不会在巨大的数据集上工作,这就是我寻求严格的Python-ish解决方案的原因。GIS工具是最早接受大数据的工具之一,所以我不确定你的评论是什么意思。我投票结束这个问题,因为它属于GIS。seHi@PaulH,我参考了一些Python工具,如scikit learn,numpy,它们可以帮助解决这个问题(和其他类似的问题,不仅仅是空间上的)。如果没有人帮忙,我可以把它移到gis.se。结束困难的问题并不是所有问题的答案……好东西,给我一天时间来测试它,然后我会在它工作时勾选它为正确的!:)在最后一个DF中,为什么索引为0和2x 2的建筑“没有”?而索引为#1的建筑在250米半径范围内没有公交车站,对吗?@cincin21所以对于无,我在打开建筑数据框时看到有一行没有名字(无),所以这就是为什么最后也没有。对于第1行是的,所有nan表示在限定范围内没有停止(在本例中为250米)@cincin21实际上,如果你在你的问题中看到你打印的
最近的_停止
在最后,那一行第1,距离是372,因此它确认了在这里找到的结果,只是想确保我理解一切!)谢谢你,干得好!这并不能回答这个问题。一旦你有足够的钱,你将能够;相反-
# join buildings with reset_index with
# closest_stop as index in closest_stop are position based
final_df = buildings.reset_index(drop=True).join(closest_stop, rsuffix='_stop')
print (final_df.head(10))
# name geometry row_stop dist stop_name \
# 0 None POINT (24.85584 60.20727) 1131 180.522 Muusantori
# 1 Uimastadion POINT (24.93045 60.18882) NaN NaN NaN
# 2 None POINT (24.95113 60.16994) 64 174.744 Senaatintori
# 2 None POINT (24.95113 60.16994) 61 119.426 Senaatintori
# 3 Hartwall Arena POINT (24.92918 60.20570) 532 106.763 Veturitie
# stop_lat stop_lon stop_id geometry_stop
# 0 60.20749 24.85745 1304138.0 POINT (24.85745 60.20749)
# 1 NaN NaN NaN None
# 2 60.16896 24.94983 1020455.0 POINT (24.94983 60.16896)
# 2 60.16901 24.95046 1020450.0 POINT (24.95046 60.16901)
# 3 60.20661 24.92968 1174112.0 POINT (24.92968 60.20661)