Python Don'；无法理解，索引器错误：数组的索引太多_Python_Filter_Distance

Python Don'；无法理解，索引器错误：数组的索引太多

python filter

Python Don'；无法理解，索引器错误：数组的索引太多,python,filter,distance,Python,Filter,Distance,我的任务是删除经纬度坐标，如果点之间的距离在特定距离内（5km或10km或30km）。这是为了建模，避免点的聚集。我用哈弗森方程来测量距离以下是我的初始代码： load the geometry record from points, then convert it to an array, compare each coordinate pairs and measure distance. After that, remove the longitude and latitude

我的任务是删除经纬度坐标，如果点之间的距离在特定距离内（5km或10km或30km）。这是为了建模，避免点的聚集。我用哈弗森方程来测量距离

以下是我的初始代码：

load the geometry record from points,  
then convert it to an array, 
compare each coordinate pairs and measure distance. 
After that, remove the longitude and latitude pairs that are   
close to each other,

但是在这一步上被卡住了

我计划更新坐标对的项目列表，并使用新的坐标对集再次迭代

运行以下脚本时会出现此错误：

索引器：数组的索引太多

迭代中的索引似乎没有更新。它在第一次传递时仍然获取索引

import math, easygui, shapefile, itertools, os
import pandas as pd
import numpy as np

filepath = easygui.fileopenbox()

input_dist = int(raw_input("Distance Filter Value?: "))
input_crop = raw_input("what crop?: ")

directory = os.path.split(filepath)[0]

def dist_haversine(shp,input_dist,input_crop):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """

    r = shapefile.Reader(shp)
    idx = np.arange(len(r.records()))
    coordinates = []
    for i in idx:
        geom = r.shape(i)
        coordinates.append(geom.points[0])    

    acoords = np.array(coordinates)

    for r,n in itertools.izip(acoords[:,0],acoords[:,1]):

        coordinates_ = []

        for i,j in itertools.izip(acoords[:,0],acoords[:,1]):

            lon1=r
            lat1=n
            lon2=i
            lat2=j

            lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])

            # haversine formula
            dlon = lon2 - lon1 
            dlat = lat2 - lat1 
            a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
            c = 2 * math.asin(math.sqrt(a)) 
            km = c*6371 #/1000.0

            if km > input_dist:
                coordinates_.append([i,j])

        coordinates[:] = coordinates_
        acoords = np.array(coordinates)

    df_coords_ = pd.DataFrame(coordinates).drop_duplicates().values
    df_coords = pd.DataFrame(df_coords_, columns=['Lon','Lat'])

    df_coords.insert(0, 'Crop', input_crop)  

    return df_coords.to_csv(os.path.split(directory)[0] + "\\" + "%s_distFilter_%skm.csv" % (input_crop, input_dist), sep=",", index=None)

回溯

文件“”，第1行，在
dist_haversine（文件路径、输入区、输入区）
文件“”，第20行，在
哈弗森区
对于itertools.izip（acoords[：，0]，acoords[：，1]）中的i，j：
索引器：数组的索引太多

这是我用来过滤点的初始解决方案。它可以工作，对于1000-3000点的数据集来说有点快。然而，尝试过滤50000点，需要2.5-3个小时才能完成

def dist_haversine(filepath,input_dist,input_crop):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """

    r = shapefile.Reader(filepath)
    idx = np.arange(len(r.records()))
    coordinates = []
    for i in idx:
        geom = r.shape(i)
        coordinates.append(geom.points[0])       

    acoords = np.array(coordinates)

    index = []        
    for r,n,l in itertools.izip(acoords[:,0],acoords[:,1],idx):
        if l in index:
            continue
        else:
            for i,j,k in itertools.izip(acoords[:,0],acoords[:,1], idx):
                if k in index:
                    continue

                else:

                    lon1=r
                    lat1=n
                    lon2=i
                    lat2=j

                    coord_check = ((lon1 == lon2) & (lat1 == lat2))*1

                    if coord_check == 1:
                        continue

                    else:
                        lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])

                        # haversine formula
                        dlon = lon2 - lon1 
                        dlat = lat2 - lat1 
                        a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
                        c = 2 * math.asin(math.sqrt(a)) 
                        km = c*6371 #/1000.0

                    if km < input_dist:
                        if k in index:
                            continue
                        else:
                            index.append(k)

    filterList = [i for j, i in enumerate(coordinates) if j not in index]

    df_coords = pd.DataFrame(filterList, columns=['Lon','Lat'])

    df_coords.insert(0, 'Crop', input_crop)  

    return df_coords.to_csv(directory + "\\" + "%s_distFilter_%skm.csv" % (input_crop, input_dist), sep=",", index=None)

def dist\u haversine（文件路径、输入区、输入裁剪）：
"""
计算两点之间的大圆距离
地球上（以十进制度数表示）
"""
r=shapefile.Reader（文件路径）
idx=np.arange（len（r.records（）））
坐标=[]
对于idx中的i：
几何=r.形（i）
坐标追加（几何点[0]）
acoords=np.数组（坐标）
索引=[]
对于itertools.izip（acoords[：，0]，acoords[：，1]，idx）中的r，n，l：
如果索引中有l：
持续
其他：
对于itertools.izip（acoords[：，0]，acoords[：，1]，idx）中的i，j，k：
如果索引中有k：
持续
其他：
lon1=r
lat1=n
lon2=i
lat2=j
协调检查=（（lon1==lon2）和（lat1==lat2））*1
如果坐标检查==1：
持续
其他：
lon1，lat1，lon2，lat2=贴图（数学弧度，[lon1，lat1，lon2，lat2]）
#哈弗森公式
dlon=lon2-lon1
dlat=lat2-lat1
a=数学sin（dlat/2）**2+数学cos（lat1）*数学cos（lat2）*数学sin（dlon/2）**2
c=2*math.asin（math.sqrt（a））
km=c*6371#/1000.0
如果公里数<输入距离：
如果索引中有k：
持续
其他：
追加索引（k）
filterList=[i代表j，如果j不在索引中，则i在枚举（坐标）中]
df_coords=pd.DataFrame（filterList，columns=['Lon'，'Lat']）
df_坐标插入（0，'裁剪'，输入_裁剪）
将df\u coords.to\u csv（目录+“\\”+“%s\u distFilter\u%skm.csv%”（输入\u裁剪，输入\u dist），sep=“，”，index=None）

Hi，请发布完整的例外情况。此外，它可能有助于“简化”问题并添加一些示例数据。问题似乎不是关于坐标，而是关于一个NumPy数组。嗨，这里是完整的错误消息：回溯（最近一次调用）：/n File“”，第1行，在dist_haversine中（filepath，input_dist，input_crop）/n File“”，第20行，在dist_haversine中，在itertools.izip中为i，j（acoods[：，0]，acoods[：，1]）：/n indexer错误：数组的索引太多/n我在Dropbox中上载了该文件：有什么解决方法吗？如果您愿意发布一个索引，也许有人愿意帮助您。

def dist_haversine(filepath,input_dist,input_crop):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """

    r = shapefile.Reader(filepath)
    idx = np.arange(len(r.records()))
    coordinates = []
    for i in idx:
        geom = r.shape(i)
        coordinates.append(geom.points[0])       

    acoords = np.array(coordinates)

    index = []        
    for r,n,l in itertools.izip(acoords[:,0],acoords[:,1],idx):
        if l in index:
            continue
        else:
            for i,j,k in itertools.izip(acoords[:,0],acoords[:,1], idx):
                if k in index:
                    continue

                else:

                    lon1=r
                    lat1=n
                    lon2=i
                    lat2=j

                    coord_check = ((lon1 == lon2) & (lat1 == lat2))*1

                    if coord_check == 1:
                        continue

                    else:
                        lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])

                        # haversine formula
                        dlon = lon2 - lon1 
                        dlat = lat2 - lat1 
                        a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
                        c = 2 * math.asin(math.sqrt(a)) 
                        km = c*6371 #/1000.0

                    if km < input_dist:
                        if k in index:
                            continue
                        else:
                            index.append(k)

    filterList = [i for j, i in enumerate(coordinates) if j not in index]

    df_coords = pd.DataFrame(filterList, columns=['Lon','Lat'])

    df_coords.insert(0, 'Crop', input_crop)  

    return df_coords.to_csv(directory + "\\" + "%s_distFilter_%skm.csv" % (input_crop, input_dist), sep=",", index=None)