Artificial intelligence K-表示如何确定特定纬度和经度附近的大多数位置

Artificial intelligence K-表示如何确定特定纬度和经度附近的大多数位置,artificial-intelligence,k-means,knn,travel-time,Artificial Intelligence,K Means,Knn,Travel Time,我知道一个城市中每个街区的中心纬度和经度,我有一组餐馆的数据,上面有它们的纬度和经度。我需要用K-Means之类的东西来确定哪个社区最密集。比如说,我有第一个系列,比如说十个纬度和经度,第二个是大约200,我如何确定这十个纬度中哪一个最密集,或者附近纬度最多?如果你知道每个街区的边界(或者说它的半径),根据城市的一些地图数据,你只需查看每家餐厅位于哪个街区 否则,您可以计算餐厅与邻居中心点之间的距离,并将200家餐厅中的每一家指定给最近的邻居 然后,您可以将每个街区的密度近似为该街区的餐厅数量除

我知道一个城市中每个街区的中心纬度和经度,我有一组餐馆的数据,上面有它们的纬度和经度。我需要用K-Means之类的东西来确定哪个社区最密集。比如说,我有第一个系列,比如说十个纬度和经度,第二个是大约200,我如何确定这十个纬度中哪一个最密集,或者附近纬度最多?

如果你知道每个街区的边界(或者说它的半径),根据城市的一些地图数据,你只需查看每家餐厅位于哪个街区

否则,您可以计算餐厅与邻居中心点之间的距离,并将200家餐厅中的每一家指定给最近的邻居

然后,您可以将每个街区的密度近似为该街区的餐厅数量除以餐厅总数

我认为你不需要任何机器学习算法

当然你可以根据你的问题来选择。这个怎么样

# import necessary modules
import pandas as pd, numpy as np, matplotlib.pyplot as plt, time
from sklearn.cluster import DBSCAN
from sklearn import metrics
from geopy.distance import great_circle
from shapely.geometry import MultiPoint


# define the number of kilometers in one radian
kms_per_radian = 6371.0088


# load the data set
df = pd.read_csv('C:\\your_path_here\\summer-travel-gps-full.csv', encoding = "ISO-8859-1")
df.head()


# how many rows are in this data set?
len(df)


# scatterplot it to get a sense of what it looks like
df = df.sort_values(by=['lat', 'lon'])
ax = df.plot(kind='scatter', x='lon', y='lat', alpha=0.5, linewidth=0)

 

# represent points consistently as (lat, lon)
# coords = df.as_matrix(columns=['lat', 'lon'])
df_coords = df[['lat', 'lon']]
# coords = df.to_numpy(df_coords)

# define epsilon as 10 kilometers, converted to radians for use by haversine
epsilon = 10 / kms_per_radian


start_time = time.time()
db = DBSCAN(eps=epsilon, min_samples=10, algorithm='ball_tree', metric='haversine').fit(np.radians(df_coords))
cluster_labels = db.labels_
unique_labels = set(cluster_labels)

# get the number of clusters
num_clusters = len(set(cluster_labels))


# get colors and plot all the points, color-coded by cluster (or gray if not in any cluster, aka noise)
fig, ax = plt.subplots()
colors = plt.cm.rainbow(np.linspace(0, 1, len(unique_labels)))

# for each cluster label and color, plot the cluster's points
for cluster_label, color in zip(unique_labels, colors):
    
    size = 150
    if cluster_label == -1: #make the noise (which is labeled -1) appear as smaller gray points
        color = 'gray'
        size = 30
    
    # plot the points that match the current cluster label
    # X.iloc[:-1]
    # df.iloc[:, 0]
    x_coords = df_coords.iloc[:, 0]
    y_coords = df_coords.iloc[:, 1]
    ax.scatter(x=x_coords, y=y_coords, c=color, edgecolor='k', s=size, alpha=0.5)

ax.set_title('Number of clusters: {}'.format(num_clusters))
plt.show()

最终结果:

0                  lat        lon
1587  37.921659  22...
1                  lat        lon
1658  37.933609  23...
2                  lat        lon
1607  37.966766  23...
3                  lat        lon
1586  38.149019  22...
4                  lat        lon
1584  38.374766  21...
                       
133              lat        lon
662  50.37369  18.889205
134               lat        lon
561  50.448704  19.0...
135               lat        lon
661  50.462271  19.0...
136               lat        lon
559  50.489304  19.0...
137             lat       lon
1  51.474005 -0.450999

0                  lat        lon
1587  37.921659  22...
1                  lat        lon
1658  37.933609  23...
2                  lat        lon
1607  37.966766  23...
3                  lat        lon
1586  38.149019  22...
4                  lat        lon
1584  38.374766  21...
                       
133              lat        lon
662  50.37369  18.889205
134               lat        lon
561  50.448704  19.0...
135               lat        lon
661  50.462271  19.0...
136               lat        lon
559  50.489304  19.0...
137             lat       lon
1  51.474005 -0.450999