Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用data.table创建组_R_Ggplot2_Data.table_Graphing - Fatal编程技术网

R 使用data.table创建组

R 使用data.table创建组,r,ggplot2,data.table,graphing,R,Ggplot2,Data.table,Graphing,工作数据集如下所示: library('data.table') df <- data.table(Name = c("a","a","b","b","c","c","d","d","e","e","f","f"), Y = sample(1:30,12), X = sample(1:30,12)) df Name Y X 1: a 14 23 2: a 19 18 3: b 10 1

工作数据集如下所示:

library('data.table')
df <- data.table(Name = c("a","a","b","b","c","c","d","d","e","e","f","f"),
                 Y = sample(1:30,12),
                 X = sample(1:30,12))

df
    Name  Y  X
 1:    a 14 23
 2:    a 19 18
 3:    b 10 16
 4:    b 23 11
 5:    c  2 12
 6:    c 12 24
 7:    d  8 14
 8:    d 26  2
 9:    e 16 26
10:    e  6  4
11:    f 29 28
12:    f 28 30
    Name  Y  X grp level
 1:    a 14 23   1     1
 2:    a 19 18   1     1
 3:    b 10 16   2     1
 4:    b 23 11   2     1
 5:    c  2 12   3     1
 6:    c 12 24   3     1
 7:    d  8 14   4     2
 8:    d 26  2   4     2
 9:    e 16 26   5     2
10:    e  6  4   5     2
11:    f 29 28   6     2
12:    f 28 30   6     2
因为实际数据集包含更多的观察值和
grp
。我正在创建的ggplot处理时间太长,最终的图形无法读取(
grp
>300)。我计划用有限数量的观察值对数据进行重新分组,并分别绘制它们的图表(例如,每次绘制10组图表)

因此,最终的数据集应该如下所示:

library('data.table')
df <- data.table(Name = c("a","a","b","b","c","c","d","d","e","e","f","f"),
                 Y = sample(1:30,12),
                 X = sample(1:30,12))

df
    Name  Y  X
 1:    a 14 23
 2:    a 19 18
 3:    b 10 16
 4:    b 23 11
 5:    c  2 12
 6:    c 12 24
 7:    d  8 14
 8:    d 26  2
 9:    e 16 26
10:    e  6  4
11:    f 29 28
12:    f 28 30
    Name  Y  X grp level
 1:    a 14 23   1     1
 2:    a 19 18   1     1
 3:    b 10 16   2     1
 4:    b 23 11   2     1
 5:    c  2 12   3     1
 6:    c 12 24   3     1
 7:    d  8 14   4     2
 8:    d 26  2   4     2
 9:    e 16 26   5     2
10:    e  6  4   5     2
11:    f 29 28   6     2
12:    f 28 30   6     2
然后我可以基于新的组
级别执行图形绘制:

ggplot(df, aes(X, Y)) + geom_point() + facet_grid(. ~ level)
在上图中,我创建了
grp
,方法很简单:

df[, grp := .GRP, by = Name]
现在的问题是如何基于
grp
自动创建
level
组(我必须创建
grp
,而不是直接使用
Name
作为基础,因为在原始数据集中,
Name
中没有模式)

我试过这样的方法:

setkey(df, grp)
i <- 1
j <- 1
while(i < 4 ) {
  df[levels(factor(grp)) == (i:i+2), level := j]
  i <- i + 2
  j <- j + 1
}
setkey(df,grp)
i如果只有几个组,可以使用
forcats
包中的
fct\u collapse()
函数。它允许轻松地将因子级别折叠到手动定义的组中

这样,就可以直接创建新变量
level
,而无需绕过组号和
cut()
。并且,可以为标高指定有意义的标签

library('data.table')
df <- data.table(Name = rep(letters[1:6], each = 2),
                 Y = sample(1:30,12),
                 X = sample(1:30,12))
df[, level := forcats::fct_collapse(Name, "a-c" = letters[1:3], "d-e" = letters[4:6])]
df
#    Name  Y  X level
# 1:    a 11 13   a-c
# 2:    a 29 12   a-c
# 3:    b 16  5   a-c
# 4:    b 12  6   a-c
# 5:    c 25 28   a-c
# 6:    c 27 11   a-c
# 7:    d  5  9   d-e
# 8:    d 23 20   d-e
# 9:    e 13 26   d-e
#10:    e 17 19   d-e
#11:    f 19  8   d-e
#12:    f 22  3   d-e
请注意,
set.seed()
用于使数据可复制

现在,
Name
(对应于OP的
grp
)的唯一值的数量被分为6个级别,并绘制在面中(如下所示):


n\u lvls
level
就是
grp
?把
切成3块,对吗?@rawr是的,我只是举个例子。在图中,
grp
有6个级别,我只想把它减少到2个级别,每个
level
grp
中包含1~3个级别。因此
df[,level:=as.numeric(cut(grp,breaks=2))
?那么。。。我想怎么用就怎么用!设想我第一次知道
cut
命令。如果你想把你的答案贴出来,我会投你的票。非常感谢!
n_lvls <- 6
df[, level := as.numeric(cut(as.integer(factor(Name)), breaks = n_lvls))] 
ggplot(df, aes(X, Y)) + geom_point() + facet_grid(. ~ level)
lvls <- df[, .N, by = Name][order(-N), level := cut(cumsum(N), n_lvls, labels = FALSE)]
df <- lvls[df, on = "Name"]

ggplot(df, aes(X, Y)) + geom_point() + facet_grid(. ~ level)