Python 对地图上的点和簇进行分组的算法必须具有与第三个特征相同的总和

Python 对地图上的点和簇进行分组的算法必须具有与第三个特征相同的总和,python,r,cluster-analysis,data-science,geo,Python,R,Cluster Analysis,Data Science,Geo,我正在寻找一种在地图(纬度/经度)上对点(43429)进行分组的算法: 但是,所有集群必须具有相同的第三个特性(费用)的总和 像kmeans这样的算法不使用相同的“权重”进行聚类 你知道怎么做吗? 我曾经使用python或R 谢谢这不是一个完美的解决方案,只是蛮力而已。 您所问的不是一个简单的计算问题(您可以阅读:多路分区问题) 以下是我的暴力R解决方案,假设您的数据是一个数据帧: k <- 3 # define how many clusters you want #A really

我正在寻找一种在地图(纬度/经度)上对点(43429)进行分组的算法:

但是,所有集群必须具有相同的第三个特性(费用)的总和

像kmeans这样的算法不使用相同的“权重”进行聚类

你知道怎么做吗? 我曾经使用python或R


谢谢

这不是一个完美的解决方案,只是蛮力而已。 您所问的不是一个简单的计算问题(您可以阅读:多路分区问题

以下是我的暴力R解决方案,假设您的数据是一个数据帧:

k <- 3 # define how many clusters you want


#A really simple gleedy clustering algorithm, basically you start each list with an elemnt and add the next element to the lowest scoring list
clustering <- function(df,k){
clusters <- list()
for (r in 1:k) {
clusters[[r]] <- df[r,]
}
for (i in 4:nrow(df)){
  a = data.frame(sum(clusters[[1]]$expenses))
  for (j in 2:k) {
    a = rbind(a,sum(clusters[[j]]$expenses))
  }
  minimo = which.min(a[,1])
  clusters[[minimo]] <- rbind(clusters[[minimo]],df[i,])
}
return(clusters)
}

#calculate the difference between the lowest and highest list
distance <- function(){
  A <- clustering(df,k)
  for (k in 1:k) {
    start <- c(start,sum(A[[k]]$expenses))  
  }
  distance <- max(start) - min(start) 
  return(distance)
}


#repeat the process with a diferent starting point and save the clusters which has the lowest variance
max.distance = distance()
Clusters <- clustering(df,k)
for (i in 2:50) {
df <- slice(df, sample(1:n()))
g=distance()
if (max.distance>g) {
  max.distance <- distance()
  Clusters <- clustering(df,k)
}
}

嗨,内森,这是优化问题吗?您希望一个算法根据费用总和进行分组,但没有限制或差距(第三个变量的总和与您允许的差距有多大)。根据数据,解决方案可能是无限的,也可能是不可能的。一个可复制的例子可能会有帮助!我只是想要k个集群,它必须有大致相同的费用总额(总额以亿为单位,10亿的差异是可以接受的)
k <- 3 # define how many clusters you want


#A really simple gleedy clustering algorithm, basically you start each list with an elemnt and add the next element to the lowest scoring list
clustering <- function(df,k){
clusters <- list()
for (r in 1:k) {
clusters[[r]] <- df[r,]
}
for (i in 4:nrow(df)){
  a = data.frame(sum(clusters[[1]]$expenses))
  for (j in 2:k) {
    a = rbind(a,sum(clusters[[j]]$expenses))
  }
  minimo = which.min(a[,1])
  clusters[[minimo]] <- rbind(clusters[[minimo]],df[i,])
}
return(clusters)
}

#calculate the difference between the lowest and highest list
distance <- function(){
  A <- clustering(df,k)
  for (k in 1:k) {
    start <- c(start,sum(A[[k]]$expenses))  
  }
  distance <- max(start) - min(start) 
  return(distance)
}


#repeat the process with a diferent starting point and save the clusters which has the lowest variance
max.distance = distance()
Clusters <- clustering(df,k)
for (i in 2:50) {
df <- slice(df, sample(1:n()))
g=distance()
if (max.distance>g) {
  max.distance <- distance()
  Clusters <- clustering(df,k)
}
}