R-tree实现的R库_R_Tree_Cluster Analysis

R-tree实现的R库

r tree

R-tree实现的R库,r,tree,cluster-analysis,R,Tree,Cluster Analysis,例如，我有数据帧 df <- data.frame(x = 1:1e3, y = rnorm(1e3)) df如果所有点都有不同的x坐标，就像您的示例中的情况一样，请根据x坐标逐渐对点进行排序。请注意，在这种情况下，可以将查找2d点的矩形覆盖（点数相等）的问题简化为查找1d点的线段覆盖（即，可以忽略矩形的高度）以下是如何找到每个矩形中的点： num_rect <- 7 # In your example 6, 12 or 24 num_points <- 10 # In y

例如，我有数据帧

df <- data.frame(x = 1:1e3, y = rnorm(1e3))

df如果所有点都有不同的x坐标，就像您的示例中的情况一样，请根据x坐标逐渐对点进行排序。请注意，在这种情况下，可以将查找2d点的矩形覆盖（点数相等）的问题简化为查找1d点的线段覆盖（即，可以忽略矩形的高度）
以下是如何找到每个矩形中的点：
num_rect <- 7 # In your example 6, 12 or 24
num_points <- 10 # In your example 1e3

# Already ordered according to x
df <- data.frame(x = 1:num_points, y = rnorm(num_points))

# Minimum number of points in the rectangles to cover all of them
points_in_rect <- ceiling(num_points/num_rect)

# Cover the first points using non-overlaping rectangles
breaks <- seq(0,num_points, by=points_in_rect)
cover <- split(seq(num_points), cut(seq(num_points), breaks))
names(cover) <- paste0("rect", seq(length(cover)))

# Cover the last points using overlaping rectangles
cur_num <- length(cover)
if (num_points < num_rect*points_in_rect ) {
  # To avoid duplicate rectangles
  last <- num_points
  if (num_points %% 1 == 0)
    last <- last -1

  while (cur_num < num_rect) {
    cur_num <- cur_num + 1
    new_rect <- list(seq(last-points_in_rect+1, last))
    names(new_rect) <- paste0("rect", cur_num)
    cover <- c(cover,new_rect)
    last <- last - points_in_rect
  }
}

包围这些点集的最小边界矩形（平行于轴）就是您要查找的矩形
两个轴上的重复坐标值
随机旋转点（保存旋转角度）并检查是否存在重复的x（或y）坐标。如果是这种情况，请使用上述旋转坐标策略（记住根据新的x坐标在旋转点之前进行排序），然后将获得的矩形反向旋转。如果复制的坐标保留在两个轴上，请以不同（随机）角度再次旋转这些点。由于点的数量有限，因此始终可以找到分隔de x（或y）坐标的旋转角度。
对于x轴上均匀分布的数据，kmeans
聚类效果很好（毫不奇怪）：
库（dplyr）
图书馆（GG2）
种子（1）
df您确认允许重叠吗？否则1e3应该是NovelLaps are Allowed的倍数。R tree不能保证每个矩形中的点数相等：您考虑的是哪个版本的算法？你在寻找相等的点数还是几乎相等的点数？我在寻找几乎相等的点数。这是一个起点，但需要额外的发展来回答你的问题：
$rect1
[1] 1 2

$rect2
[1] 3 4

$rect3
[1] 5 6

$rect4
[1] 7 8

$rect5
[1]  9 10

$rect6
[1] 8 9

$rect7
[1] 6 7

library(dplyr)
library(ggplot2)

set.seed(1)
df <- data.frame(x = 1:1e3, y = rnorm(1e3))

N <- 10
df$cluster <- kmeans(df,N)$cluster

cluster_rectangles <- df %>% group_by(cluster) %>% 
       summarize(xmin = min(x),
                 xmax = max(x),
                 ymin = min(y),
                 ymax = max(y),
                 n = n())  

ggplot() + geom_rect(data = cluster_rectangles, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, fill=cluster)) +
           geom_point(data = df,mapping=aes(x,y),color='white')

df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))

> cluster_rectangles %>% select(cluster,n)
# A tibble: 10 x 2
   cluster     n
     <int> <int>
 1       1   137
 2       2    58
 3       3   121
 4       4    61
 5       5    72
 6       6   184
 7       7    78
 8       8    70
 9       9   126
10      10    93