如何提高python、pandas、geopandas中地理空间数据的循环性能_Python_Pandas_Performance_Nested Loops_Geopandas

如何提高python、pandas、geopandas中地理空间数据的循环性能

python pandas performance

如何提高python、pandas、geopandas中地理空间数据的循环性能,python,pandas,performance,nested-loops,geopandas,Python,Pandas,Performance,Nested Loops,Geopandas,我有一个数据帧（df）和一个geojson（gdf）。在dataframe中，我有一个三列，region，region check和geometry-df['geometry]，它有点坐标，比如：point（37.98730 11.09990）。数据帧有40000行我想遍历数据帧并检查坐标是否正确分配给该区域。输出将对照geojson文件检查坐标，并在新的空列-region\u check-中指示正确的列我有一个循环，但是太慢了。我希望有人能建议如何加快这个循环非常感谢 import pa

我有一个数据帧（df）和一个geojson（gdf）。在dataframe中，我有一个三列，region，region check和geometry-df['geometry]，它有点坐标，比如：point（37.98730 11.09990）。数据帧有40000行

我想遍历数据帧并检查坐标是否正确分配给该区域。输出将对照geojson文件检查坐标，并在新的空列-region\u check-中指示正确的列

我有一个循环，但是太慢了。我希望有人能建议如何加快这个循环

非常感谢

import pandas as pd
import numpy as np
import geopandas as gpd

df = pd.read_csv('gis_data_2020_check.csv')
gdf = gpd.read_file('eth_admin1.json')
df['region_check'] = ''

i = 0
count = 0
while i < len(df):
    if count < len(gdf):
        test = df['geometry'].iloc[i].within(gdf['geometry'].iloc[count])
        if test == True:
            df['region_check'].iloc[i] = gdf['ADM1_EN'].iloc[count]
            i += 1
            count = 0
        else:
            count +=1

将熊猫作为pd导入
将numpy作为np导入
作为gpd导入geopandas
df=pd.read\u csv（'gis\u data\u 2020\u check.csv'））
gdf=gpd.read\u文件（'eth\u admin1.json'））
df['区域检查']=''
i=0
计数=0
而i

使用GeoPandas，您可以使用R树空间索引。您必须安装

libspatialindex

才能使其运行：

conda install libspatialindex

现在可以查询相交几何图形的索引：

spatial_index = gdf.sindex
possible_matches_index = list(spatial_index.intersection(polygon.bounds))

关于这个话题的好解释：

使用两个GeodataFrame执行“空间连接”将大大加快该过程，因为这样可以消除循环

步骤1：为您的点创建一个geodataframe，如下所示：

points = gpd.GeoDataFrame(df, geometry='geometry', crs=gdf.crs)

pointInPolys = gpd.tools.sjoin(points, polys, op="within", how='left')

步骤2：使用“sjoin”合并两个GeodataFrame，即类似于：

points = gpd.GeoDataFrame(df, geometry='geometry', crs=gdf.crs)

pointInPolys = gpd.tools.sjoin(points, polys, op="within", how='left')

顺便说一句，我写了一篇关于这种多边形中的点测试的文章，考虑了更多的示例代码和性能。我希望它是有用的

为了提高性能，请确保使用可选的依赖项Rtrees和pyGEOS构建GeoPanda