Python-加速反向地理编码_Python_Reverse Geocoding

Python-加速反向地理编码

python

Python-加速反向地理编码,python,reverse-geocoding,Python,Reverse Geocoding,我目前正在执行反向地理编码操作，如下所示： import json from shapely.geometry import shape, Point import time with open('districts.json') as f: districts = json.load(f) # file also kept at https://raw.githubusercontent.com/Thevesh/Display/master/districts.json def rever

我目前正在执行反向地理编码操作，如下所示：

import json
from shapely.geometry import shape, Point
import time

with open('districts.json') as f: districts = json.load(f)
# file also kept at https://raw.githubusercontent.com/Thevesh/Display/master/districts.json

def reverse_geocode(lon,lat):
    point = Point(lon, lat) # lon/lat
    for feature in districts['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point): return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
    return ['','']

start_time = time.time()
for i in range(1000): test = reverse_geocode(103, 3)
print('----- Code ran in ' + "{:.3f}".format(time.time() - start_time) + ' seconds -----')

这需要大约13秒来反转1000个点的地理编码，这很好

然而，我需要为一个任务反向编码10mil坐标对，这意味着假设线性复杂度，这将需要130k秒（1.5天）。不好

该算法的明显低效之处在于，每次对点进行分类时，它都会遍历整个多边形集，这是一种极大的时间浪费

如何改进此代码？要在任务可接受的时间内计算10mil对，我需要在1秒内运行1k对。

我使用并行性实现了这个算法

如果可能的话，如果对你有用的话，把它还给我。请记住，这是一个业余算法，需要调整

import concurrent.futures

with open('districts.json') as f: districts = json.load(f)

def reverse_geocode(lon:int, lat:int) -> list:

    point = Point(lon, lat) # lon/lat
    for feature in districts['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point):
            return [(feature['properties'])['ADM1_EN'], (feature['properties'])['ADM2_EN']]
    return ['','']

if __name__ == '__main__':
    time_start = time.time()

    with concurrent.futures.ProcessPoolExecutor() as process:
        for url in range(1000):
            process.submit(reverse_geocode, 103, 3)

    time_end = time.time()
    print(f'\nfim {round(time_end - time_start, 2)} seconds')

第一个明显的解决方案是使用多处理来运行它，你试过了吗？这将时间减少了一半，这非常棒，但我认为根本的问题仍然是我们正在遍历整个列表。我将尝试一种方法，对要搜索的列表和要搜索的功能进行排序。