Python 如何优化Shapely和Sklearn代码?
我正在处理一个420万点的数据集,我的代码已经需要一段时间来处理,但是下面的代码需要几个小时来处理(该代码在其他公开问题中提供,基本上它将最近的线字符串带到一个点,从该线字符串找到最近的点并计算距离) 这些代码实际上做得很好,但是对于它的目的来说花费的时间太长了,我怎么能在最短的时间内优化或者做同样的事情呢Python 如何优化Shapely和Sklearn代码?,python,distance,geopandas,nearest-neighbor,haversine,Python,Distance,Geopandas,Nearest Neighbor,Haversine,我正在处理一个420万点的数据集,我的代码已经需要一段时间来处理,但是下面的代码需要几个小时来处理(该代码在其他公开问题中提供,基本上它将最近的线字符串带到一个点,从该线字符串找到最近的点并计算距离) 这些代码实际上做得很好,但是对于它的目的来说花费的时间太长了,我怎么能在最短的时间内优化或者做同样的事情呢 import geopandas as gpd import numpy as np from shapely.geometry import Point, LineString from
import geopandas as gpd
import numpy as np
from shapely.geometry import Point, LineString
from shapely.ops import nearest_points
from sklearn.neighbors import DistanceMetric
EARTH_RADIUS_IN_MILES = 3440.1 #NAUTICAL MILES
panama = gpd.read_file("/Users/Danilo/Documents/Python/panama_coastline/panama_coastline.shp")
for c in range(b):
#p = Point(-77.65325423107359,9.222038196656131)
p=Point(data['longitude'][c],data['latitude'][c])
def closest_line(point, linestrings):
return np.argmin( [p.distance(linestring) for linestring in panama.geometry] )
closest_linestring = panama.geometry[ closest_line(p, panama.geometry) ]
closest_linestring
closest_point = nearest_points(p, closest_linestring)
dist = DistanceMetric.get_metric('haversine')
points_as_floats = [ np.array([p.x, p.y]) for p in closest_point ]
haversine_distances = dist.pairwise(np.radians(points_as_floats), np.radians(points_as_floats) )
haversine_distances *= EARTH_RADIUS_IN_MILES
dtc1=haversine_distances[0][1]
dtc.append(dtc1)
编辑:使用BallTree简化为单个计算 进口
import pandas as pd
import geopandas as gpd
import numpy as np
from shapely.geometry import Point, LineString
from shapely.ops import nearest_points
读巴拿马
panama = gpd.read_file("panama_coastline/panama_coastline.shp")
获取所有点,长,纬度格式:
def get_points_as_numpy(geom):
work_list = []
for g in geom:
work_list.append( np.array(g.coords) )
return np.concatenate(work_list)
all_coastline_points = get_points_as_numpy(panama.geometry)
创建Balltree
from sklearn.neighbors import BallTree
import numpy as np
panama_radians = np.radians(np.flip(all_coastline_points,axis=1))
tree = BallTree(panama_radians, leaf_size=12, metric='haversine')
创建1M随机点:
mean = [8.5,-80]
cov = [[1,0],[0,5]] # diagonal covariance, points lie on x or y-axis
random_gps = np.random.multivariate_normal(mean,cov,(10**6))
random_points = pd.DataFrame( {'lat' : random_gps[:,0], 'long' : random_gps[:,1]})
random_points.head()
计算最近的海岸点(编辑:使用BallTree简化为单个计算 进口
import pandas as pd
import geopandas as gpd
import numpy as np
from shapely.geometry import Point, LineString
from shapely.ops import nearest_points
读巴拿马
panama = gpd.read_file("panama_coastline/panama_coastline.shp")
获取所有点,长,纬度格式:
def get_points_as_numpy(geom):
work_list = []
for g in geom:
work_list.append( np.array(g.coords) )
return np.concatenate(work_list)
all_coastline_points = get_points_as_numpy(panama.geometry)
创建Balltree
from sklearn.neighbors import BallTree
import numpy as np
panama_radians = np.radians(np.flip(all_coastline_points,axis=1))
tree = BallTree(panama_radians, leaf_size=12, metric='haversine')
创建1M随机点:
mean = [8.5,-80]
cov = [[1,0],[0,5]] # diagonal covariance, points lie on x or y-axis
random_gps = np.random.multivariate_normal(mean,cov,(10**6))
random_points = pd.DataFrame( {'lat' : random_gps[:,0], 'long' : random_gps[:,1]})
random_points.head()
计算最近的滑行点(您好,欢迎:)您是否尝试过自己对其进行优化,但在这样做时遇到了问题?您是否测量了此方法的性能?您对此方法是否有特定的时间/性能要求?您从何处获得此代码?请勿放置
def nearest_line()
在循环内。dist=DistanceMetric.get_metric('haversine')
没有变化,请将其置于循环外。您可以放置数据样本吗?在处理要使用树计算的大型点时。您好,欢迎:)您是否尝试过自己优化它,但在这样做时遇到问题?你测量过这种方法的性能吗?您对这种方法有具体的时间/性能要求吗?您从何处获得此代码?不要将def nearest_line()
放在循环中dist=DistanceMetric。get_metric('haversine')
没有变化,请将其置于循环之外。您可以放置数据样本吗?处理要使用树计算的大型点时。