R-tree实现的R库
例如,我有数据帧R-tree实现的R库,r,tree,cluster-analysis,R,Tree,Cluster Analysis,例如,我有数据帧 df <- data.frame(x = 1:1e3, y = rnorm(1e3)) df如果所有点都有不同的x坐标,就像您的示例中的情况一样,请根据x坐标逐渐对点进行排序。请注意,在这种情况下,可以将查找2d点的矩形覆盖(点数相等)的问题简化为查找1d点的线段覆盖(即,可以忽略矩形的高度) 以下是如何找到每个矩形中的点: num_rect <- 7 # In your example 6, 12 or 24 num_points <- 10 # In y
df <- data.frame(x = 1:1e3, y = rnorm(1e3))
df如果所有点都有不同的x坐标,就像您的示例中的情况一样,请根据x坐标逐渐对点进行排序。请注意,在这种情况下,可以将查找2d点的矩形覆盖(点数相等)的问题简化为查找1d点的线段覆盖(即,可以忽略矩形的高度)
以下是如何找到每个矩形中的点:
num_rect <- 7 # In your example 6, 12 or 24
num_points <- 10 # In your example 1e3
# Already ordered according to x
df <- data.frame(x = 1:num_points, y = rnorm(num_points))
# Minimum number of points in the rectangles to cover all of them
points_in_rect <- ceiling(num_points/num_rect)
# Cover the first points using non-overlaping rectangles
breaks <- seq(0,num_points, by=points_in_rect)
cover <- split(seq(num_points), cut(seq(num_points), breaks))
names(cover) <- paste0("rect", seq(length(cover)))
# Cover the last points using overlaping rectangles
cur_num <- length(cover)
if (num_points < num_rect*points_in_rect ) {
# To avoid duplicate rectangles
last <- num_points
if (num_points %% 1 == 0)
last <- last -1
while (cur_num < num_rect) {
cur_num <- cur_num + 1
new_rect <- list(seq(last-points_in_rect+1, last))
names(new_rect) <- paste0("rect", cur_num)
cover <- c(cover,new_rect)
last <- last - points_in_rect
}
}
包围这些点集的最小边界矩形(平行于轴)就是您要查找的矩形
两个轴上的重复坐标值
随机旋转点(保存旋转角度)并检查是否存在重复的x(或y)坐标。如果是这种情况,请使用上述旋转坐标策略(记住根据新的x坐标在旋转点之前进行排序),然后将获得的矩形反向旋转。如果复制的坐标保留在两个轴上,请以不同(随机)角度再次旋转这些点。由于点的数量有限,因此始终可以找到分隔de x(或y)坐标的旋转角度。对于x轴上均匀分布的数据,kmeans
聚类效果很好(毫不奇怪):
库(dplyr)
图书馆(GG2)
种子(1)
df您确认允许重叠吗?否则1e3应该是NovelLaps are Allowed的倍数。R tree不能保证每个矩形中的点数相等:您考虑的是哪个版本的算法?你在寻找相等的点数还是几乎相等的点数?我在寻找几乎相等的点数。这是一个起点,但需要额外的发展来回答你的问题:
$rect1
[1] 1 2
$rect2
[1] 3 4
$rect3
[1] 5 6
$rect4
[1] 7 8
$rect5
[1] 9 10
$rect6
[1] 8 9
$rect7
[1] 6 7
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(x = 1:1e3, y = rnorm(1e3))
N <- 10
df$cluster <- kmeans(df,N)$cluster
cluster_rectangles <- df %>% group_by(cluster) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y),
n = n())
ggplot() + geom_rect(data = cluster_rectangles, mapping=aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, fill=cluster)) +
geom_point(data = df,mapping=aes(x,y),color='white')
df <- data.frame(x = rnorm(1e3), y = rnorm(1e3))
> cluster_rectangles %>% select(cluster,n)
# A tibble: 10 x 2
cluster n
<int> <int>
1 1 137
2 2 58
3 3 121
4 4 61
5 5 72
6 6 184
7 7 78
8 8 70
9 9 126
10 10 93