按组执行频率表,计算R中的值计数
假设这是我的数据集按组执行频率表,计算R中的值计数,r,dplyr,data.table,lapply,R,Dplyr,Data.table,Lapply,假设这是我的数据集 (dput) dataset<-structure(list(group1 = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("b", "x"), class = "factor"), group2 = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("g", "y"), class = "factor"), v
(dput)
dataset<-structure(list(group1 = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L), .Label = c("b", "x"), class = "factor"), group2 = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("g", "y"), class = "factor"),
var1 = c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L)), .Names = c("group1",
"group2", "var1"), class = "data.frame", row.names = c(NA, -9L
))
对于变量var1,计算1值和2值的计数。每一组。
所以期望的输出
total_count_of_group var1-1 var1-2
x y 5
3 2
b g 4 2 2
该输出意味着组x+y的总计数=5 obs。由这个小组负责。
其中1值满足3次,2值满足2次
相似地
组b+g的总计数=4 obs。由这个小组负责。
其中1值满足2次,2值满足2次
如何获得这样的表?您可以生成三个表,选择相关的计数,然后合并到一个数据帧中
a <- table(dataset$group1, dataset$group2)
b <- table(dataset$var1[dataset$group1=='x'])
d <- table(dataset$var1[dataset$group1=='b'])
data.frame(total_count_of_group = c(a[2,2], a[1,1]),
var1_1 = c(b[1], b[2]),
var1_2 = c(d[1], d[2]))
total_count_of_group var1_1 var1_2
1 5 3 2
2 4 2 2
这可以通过两个步骤解决: 聚合组总数并更新数据集 从长格式改为宽格式 使用data.table: 请注意,这将适用于var1中任意数量的不同值以及任意数量的组。这里有一个使用base R的选项
library(tidyverse)
dataset %>%
group_by(group1, group2) %>% # for each combination of groups
mutate(counts = n()) %>% # count number of rows
count(group1, group2, var1, counts) %>% # count unique combinations
spread(var1, n, sep = "_") %>% # reshape dataset
ungroup() # forget the grouping
# # A tibble: 2 x 5
# group1 group2 counts var1_1 var1_2
# <fct> <fct> <int> <int> <int>
# 1 b g 4 2 2
# 2 x y 5 3 2
以下是tidyverse解决方案:
library(tidyverse)
dataset %>%
group_by(group1, group2) %>%
summarize(total = n(), x = list(table(var1) %>% as_tibble %>% spread(var1,n))) %>%
unnest
# # A tibble: 2 x 5
# # Groups: group1 [2]
# group1 group2 total `1` `2`
# <fct> <fct> <int> <int> <int>
# 1 b g 4 2 2
# 2 x y 5 3 2
group1 group2 total_count_of_group var1_1 var1_2
1: b g 4 2 2
2: x y 5 3 2
library(tidyverse)
dataset %>%
group_by(group1, group2) %>% # for each combination of groups
mutate(counts = n()) %>% # count number of rows
count(group1, group2, var1, counts) %>% # count unique combinations
spread(var1, n, sep = "_") %>% # reshape dataset
ungroup() # forget the grouping
# # A tibble: 2 x 5
# group1 group2 counts var1_1 var1_2
# <fct> <fct> <int> <int> <int>
# 1 b g 4 2 2
# 2 x y 5 3 2
out <- aggregate(cbind(var = rep(1, nrow(df1))) ~ .,
transform(df1, counts = ave(var1, group1, group2, FUN = length)), length)
reshape(out, idvar = c('group1', 'group2', 'counts'),
timevar= 'var1', direction= 'wide')
# group1 group2 counts var.1 var.2
#1 b g 4 2 2
#3 x y 5 3 2
library(tidyverse)
dataset %>%
group_by(group1, group2) %>%
summarize(total = n(), x = list(table(var1) %>% as_tibble %>% spread(var1,n))) %>%
unnest
# # A tibble: 2 x 5
# # Groups: group1 [2]
# group1 group2 total `1` `2`
# <fct> <fct> <int> <int> <int>
# 1 b g 4 2 2
# 2 x y 5 3 2