R-tree实现的R库

R-tree实现的R库,r,tree,cluster-analysis,R,Tree,Cluster Analysis,例如,我有数据帧 df <- data.frame(x = 1:1e3, y = rnorm(1e3)) df如果所有点都有不同的x坐标,就像您的示例中的情况一样,请根据x坐标逐渐对点进行排序。请注意,在这种情况下,可以将查找2d点的矩形覆盖(点数相等)的问题简化为查找1d点的线段覆盖(即,可以忽略矩形的高度) 以下是如何找到每个矩形中的点: num_rect <- 7 # In your example 6, 12 or 24 num_points <- 10 # In y

例如,我有数据帧

df <- data.frame(x = 1:1e3, y = rnorm(1e3))

df如果所有点都有不同的x坐标,就像您的示例中的情况一样,请根据x坐标逐渐对点进行排序。请注意,在这种情况下,可以将查找2d点的矩形覆盖(点数相等)的问题简化为查找1d点的线段覆盖(即,可以忽略矩形的高度)

以下是如何找到每个矩形中的点:

num_rect <- 7 # In your example 6, 12 or 24
num_points <- 10 # In your example 1e3

# Already ordered according to x
df <- data.frame(x = 1:num_points, y = rnorm(num_points))

# Minimum number of points in the rectangles to cover all of them
points_in_rect <- ceiling(num_points/num_rect)

# Cover the first points using non-overlaping rectangles
breaks <- seq(0,num_points, by=points_in_rect)
cover <- split(seq(num_points), cut(seq(num_points), breaks))
names(cover) <- paste0("rect", seq(length(cover)))

# Cover the last points using overlaping rectangles
cur_num <- length(cover)
if (num_points < num_rect*points_in_rect ) {
  # To avoid duplicate rectangles
  last <- num_points
  if (num_points %% 1 == 0)
    last <- last -1

  while (cur_num < num_rect) {
    cur_num <- cur_num + 1
    new_rect <- list(seq(last-points_in_rect+1, last))
    names(new_rect) <- paste0("rect", cur_num)
    cover <- c(cover,new_rect)
    last <- last - points_in_rect
  }
}
包围这些点集的最小边界矩形(平行于轴)就是您要查找的矩形

两个轴上的重复坐标值
随机旋转点(保存旋转角度)并检查是否存在重复的x(或y)坐标。如果是这种情况,请使用上述旋转坐标策略(记住根据新的x坐标在旋转点之前进行排序),然后将获得的矩形反向旋转。如果复制的坐标保留在两个轴上,请以不同(随机)角度再次旋转这些点。由于点的数量有限,因此始终可以找到分隔de x(或y)坐标的旋转角度。

对于x轴上均匀分布的数据,
kmeans
聚类效果很好(毫不奇怪):

库(dplyr)
图书馆(GG2)
种子(1)

df您确认允许重叠吗?否则1e3应该是NovelLaps are Allowed的倍数。R tree不能保证每个矩形中的点数相等:您考虑的是哪个版本的算法?你在寻找相等的点数还是几乎相等的点数?我在寻找几乎相等的点数。这是一个起点,但需要额外的发展来回答你的问题:
$rect1
[1] 1 2

$rect2
[1] 3 4

$rect3
[1] 5 6

$rect4
[1] 7 8

$rect5
[1]  9 10

$rect6
[1] 8 9

$rect7
[1] 6 7
library(dplyr)
library(ggplot2)

set.seed(1)
df <- data.frame(x = 1:1e3, y = rnorm(1e3))

N <- 10
df$cluster <- kmeans(df,N)$cluster

cluster_rectangles <- df %>% group_by(cluster) %>% 
       summarize(xmin = min(x),
                 xmax = max(x),
                 ymin = min(y),
                 ymax = max(y),
                 n = n())  

ggplot() + geom_rect(data = cluster_rectangles, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, fill=cluster)) +
           geom_point(data = df,mapping=aes(x,y),color='white')
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
> cluster_rectangles %>% select(cluster,n)
# A tibble: 10 x 2
   cluster     n
     <int> <int>
 1       1   137
 2       2    58
 3       3   121
 4       4    61
 5       5    72
 6       6   184
 7       7    78
 8       8    70
 9       9   126
10      10    93