Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 带距离条件的最近邻连接_Python_Pandas_Scikit Learn_Geopandas - Fatal编程技术网

Python 带距离条件的最近邻连接

Python 带距离条件的最近邻连接,python,pandas,scikit-learn,geopandas,Python,Pandas,Scikit Learn,Geopandas,在这个问题上,我指的是这个项目: 我们有两个GeodataFrame: 建筑物: name geometry 0 None POINT (24.85584 60.20727) 1 Uimastadion POINT (24.93045 60.18882) 2 None POINT (24.95113 60.16994) 3 Hartwall Arena POINT (24

在这个问题上,我指的是这个项目:

我们有两个GeodataFrame:

建筑物:

             name                   geometry
0            None  POINT (24.85584 60.20727)
1     Uimastadion  POINT (24.93045 60.18882)
2            None  POINT (24.95113 60.16994)
3  Hartwall Arena  POINT (24.92918 60.20570)
及巴士站:

     stop_name   stop_lat   stop_lon  stop_id                   geometry
0  Ritarihuone  60.169460  24.956670  1010102  POINT (24.95667 60.16946)
1   Kirkkokatu  60.171270  24.956570  1010103  POINT (24.95657 60.17127)
2   Kirkkokatu  60.170293  24.956721  1010104  POINT (24.95672 60.17029)
3    Vironkatu  60.172580  24.956554  1010105  POINT (24.95655 60.17258)
申请后

sklearn.com导入BallTree

我们得到每个建筑指数到最近公共汽车站的距离:

    stop_name    stop_lat   stop_lon    stop_id                 geometry      distance
0   Muusantori   60.207490  24.857450   1304138 POINT (24.85745 60.20749)   180.521584
1   Eläintarha   60.192490  24.930840   1171120 POINT (24.93084 60.19249)   372.665221
2   Senaatintori 60.169010  24.950460   1020450 POINT (24.95046 60.16901)   119.425777
3   Veturitie    60.206610  24.929680   1174112 POINT (24.92968 60.20661)   106.762619
我正在寻找解决方案,使每栋建筑的每一个公交车站(可能不止一个)距离低于250米


谢谢您的帮助。

要在250米内到达最近的公共汽车站:

按距离过滤的距离=最近的距离[最近的距离.距离<250]
结果=建筑物。连接(按距离过滤)

对于需要使用的半径范围内的所有挡块。但是您需要将米转换为弧度。

这里有一种方法,可以重复使用BallTree所做的工作,就像前面提到的那样,但是使用
query\u radius
。此外,它不是函数格式,但您仍然可以轻松更改它

from sklearn.neighbors import BallTree
import numpy as np
import pandas as pd
## here I start with buildings and stops as loaded in the link provided

# variable in meter you can change
radius_max = 250 # meters
# another parameter, in case you want to do with Mars radius ^^
earth_radius = 6371000  # meters

# similar to the method with apply in the tutorial 
# to create left_radians and right_radians, but faster
candidates = np.vstack([stops['geometry'].x.to_numpy(), 
                        stops['geometry'].y.to_numpy()]).T*np.pi/180
src_points = np.vstack([buildings['geometry'].x.to_numpy(), 
                        buildings['geometry'].y.to_numpy()]).T*np.pi/180

# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# use query_radius instead
ind_radius, dist_radius = tree.query_radius(src_points, 
                                            r=radius_max/earth_radius, 
                                            return_distance=True)
现在,您可以操纵结果以获得所需内容

# create a dataframe build with
# index based on row position of the building in buildings
# column row_stop is the row position of the stop
# dist is the distance
closest_dist = pd.concat([pd.Series(ind_radius).explode().rename('row_stop'), 
                          pd.Series(dist_radius).explode().rename('dist')*earth_radius], 
                         axis=1)
print (closest_dist.head())
#  row_stop     dist
#0     1131  180.522
#1      NaN      NaN
#2       64  174.744
#2       61  119.426
#3      532  106.763

# merge the dataframe created above with the original data stops
# to get names, id, ... note: the index must be reset as in closest_dist
# it is position based
closest_stop = closest_dist.merge(stops.reset_index(drop=True), 
                                  left_on='row_stop', right_index=True, how='left')
print (closest_stop.head())
#  row_stop     dist     stop_name  stop_lat  stop_lon    stop_id  \
#0     1131  180.522    Muusantori  60.20749  24.85745  1304138.0   
#1      NaN      NaN           NaN       NaN       NaN        NaN   
#2       64  174.744  Senaatintori  60.16896  24.94983  1020455.0   
#2       61  119.426  Senaatintori  60.16901  24.95046  1020450.0   
#3      532  106.763     Veturitie  60.20661  24.92968  1174112.0   
#
#                    geometry  
#0  POINT (24.85745 60.20749)  
#1                       None  
#2  POINT (24.94983 60.16896)  
#2  POINT (24.95046 60.16901)  
#3  POINT (24.92968 60.20661) 
最后连接回建筑物

# join buildings with reset_index with 
# closest_stop as index in closest_stop are position based
final_df = buildings.reset_index(drop=True).join(closest_stop, rsuffix='_stop')
print (final_df.head(10))
#              name                   geometry row_stop     dist     stop_name  \
# 0            None  POINT (24.85584 60.20727)     1131  180.522    Muusantori   
# 1     Uimastadion  POINT (24.93045 60.18882)      NaN      NaN           NaN   
# 2            None  POINT (24.95113 60.16994)       64  174.744  Senaatintori   
# 2            None  POINT (24.95113 60.16994)       61  119.426  Senaatintori   
# 3  Hartwall Arena  POINT (24.92918 60.20570)      532  106.763     Veturitie   

#    stop_lat  stop_lon    stop_id              geometry_stop  
# 0  60.20749  24.85745  1304138.0  POINT (24.85745 60.20749)  
# 1       NaN       NaN        NaN                       None  
# 2  60.16896  24.94983  1020455.0  POINT (24.94983 60.16896)  
# 2  60.16901  24.95046  1020450.0  POINT (24.95046 60.16901)  
# 3  60.20661  24.92968  1174112.0  POINT (24.92968 60.20661)  

所以,我有一个奇怪的问题——计算的距离是错误的。我交换了geopandas数据帧中的纬度和经度字段,距离更好,但它们仍然相差半米到一米(这仍然是不正确的)。以前有人用balltree遇到过这个问题吗?

谢谢,我会在balltree上阅读。query\u radius,第一个解决方案不会像我希望的那样在每个公交车站(可能不止一个)都起作用。这种问题非常适合。@s.k是和否,因为GIS工具不会在巨大的数据集上工作,这就是我寻求严格的Python-ish解决方案的原因。GIS工具是最早接受大数据的工具之一,所以我不确定你的评论是什么意思。我投票结束这个问题,因为它属于GIS。seHi@PaulH,我参考了一些Python工具,如scikit learn,numpy,它们可以帮助解决这个问题(和其他类似的问题,不仅仅是空间上的)。如果没有人帮忙,我可以把它移到gis.se。结束困难的问题并不是所有问题的答案……好东西,给我一天时间来测试它,然后我会在它工作时勾选它为正确的!:)在最后一个DF中,为什么索引为0和2x 2的建筑“没有”?而索引为#1的建筑在250米半径范围内没有公交车站,对吗?@cincin21所以对于无,我在打开建筑数据框时看到有一行没有名字(无),所以这就是为什么最后也没有。对于第1行是的,所有nan表示在限定范围内没有停止(在本例中为250米)@cincin21实际上,如果你在你的问题中看到你打印的
最近的_停止
在最后,那一行第1,距离是372,因此它确认了在这里找到的结果,只是想确保我理解一切!)谢谢你,干得好!这并不能回答这个问题。一旦你有足够的钱,你将能够;相反-
# join buildings with reset_index with 
# closest_stop as index in closest_stop are position based
final_df = buildings.reset_index(drop=True).join(closest_stop, rsuffix='_stop')
print (final_df.head(10))
#              name                   geometry row_stop     dist     stop_name  \
# 0            None  POINT (24.85584 60.20727)     1131  180.522    Muusantori   
# 1     Uimastadion  POINT (24.93045 60.18882)      NaN      NaN           NaN   
# 2            None  POINT (24.95113 60.16994)       64  174.744  Senaatintori   
# 2            None  POINT (24.95113 60.16994)       61  119.426  Senaatintori   
# 3  Hartwall Arena  POINT (24.92918 60.20570)      532  106.763     Veturitie   

#    stop_lat  stop_lon    stop_id              geometry_stop  
# 0  60.20749  24.85745  1304138.0  POINT (24.85745 60.20749)  
# 1       NaN       NaN        NaN                       None  
# 2  60.16896  24.94983  1020455.0  POINT (24.94983 60.16896)  
# 2  60.16901  24.95046  1020450.0  POINT (24.95046 60.16901)  
# 3  60.20661  24.92968  1174112.0  POINT (24.92968 60.20661)