R-如何避免重复过滤&;行绑定
因为我正在处理一个非常大的数据集,所以我需要按组对数据集进行切片,以便继续计算 我有一个person period(R-如何避免重复过滤&;行绑定,r,dplyr,reshape2,R,Dplyr,Reshape2,因为我正在处理一个非常大的数据集,所以我需要按组对数据集进行切片,以便继续计算 我有一个person period(melt)数据集,如下所示 group id var time 1 A 1 a 1 2 A 1 b 2 3 A 1 a 3 4 A 2 b 1 5 A 2 b 2 6 A 2 b 3 7 B 1 a 1 8 B
melt
)数据集,如下所示
group id var time
1 A 1 a 1
2 A 1 b 2
3 A 1 a 3
4 A 2 b 1
5 A 2 b 2
6 A 2 b 3
7 B 1 a 1
8 B 1 a 2
9 B 1 a 3
10 B 2 c 1
11 B 2 c 2
12 B 2 c 3
我需要做这个简单的转换
library(reshape2)
library(dplyr)
dt %>% dcast(group + id ~ time, value.var = 'var')
为了得到
group id 1 2 3
1 A 1 a b a
2 A 2 b b b
3 B 1 a a a
4 B 2 c c c
到目前为止,一切顺利
但是,由于我的数据库太大,我需要为每个不同的组分别执行此操作,例如
a = dt %>% filter(group == 'A') %>% dcast(group + id ~ time, value.var ='var')
b = dt %>% filter(group == 'B') %>% dcast(group + id ~ time, value.var = 'var')
bind_rows(a,b)
我的问题是,我想避免手工操作。我的意思是,必须分别存储每个组,a=…,b=…,c=…,等等
你知道我怎么能有一个单独的管道
流来分离每个组,计算转换并把它放回一个数据帧中吗
dt = structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), var = structure(c(1L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor"), time = structure(c(1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor")), .Names = c("group", "id",
"var", "time"), row.names = c(NA, -12L), class = "data.frame")
lappy
这里是你的朋友:
do.call(rbind, lapply(unique(dt$Group), function(grp, dt){
dt %>% filter(Group == grp) %>% dcast(group + id ~ time, value.var = "var")
}, dt = dt))
包purrr对于处理列表非常有用。首先按组分割数据集,然后使用map_df
到dcast
每个列表,但在单个data.frame中返回所有内容
library(purrr)
dt %>%
split(.$group) %>%
map_df(~dcast(.x, group + id ~ time, value.var = "var"))
group id 1 2 3
1 A 1 a b a
2 A 2 b b b
3 B 1 a a a
4 B 2 c c c
如果你想使用一个额外的包,这是一个很好的解决方案:)我一定会把purrr
放在我要签出的包列表上。