R 计算变量组的频率
我想根据发现的物种来计算模式的频率 这是数据框,我想数一数每种类型的吉特的数量,并数一数那些只发现埃及的吉特,只有那些发现albo的吉特和混合吉特的吉特R 计算变量组的频率,r,dataframe,count,aggregate,R,Dataframe,Count,Aggregate,我想根据发现的物种来计算模式的频率 这是数据框,我想数一数每种类型的吉特的数量,并数一数那些只发现埃及的吉特,只有那些发现albo的吉特和混合吉特的吉特 type_gite aegypti albopictus total recipient_abandonne 19 0 19 recipient_stockage
type_gite aegypti albopictus total
recipient_abandonne 19 0 19
recipient_stockage 0 2 2
recipient_stockage 8 0 8
recipient_stockage 36 0 36
recipient_stockage 13 0 13
recipient_stockage 1 3 4
autres 0 1 1
autres 0 9 9
recipient_abandonne 3 0 3
下面是它的外观:
type gite aegypti albopictus mixed total
recipient_abandonne 2 0 0 2
recipient stockage 3 1 1 5
autres 0 2 0 2
total 5 3 1 9
哪种代码或聚合公式最合适?我想您正在寻找类似的东西。我以一些随机的虚拟数据为例
library(dplyr)
# Create dummy data
df <- data.frame(matrix(rnorm(10), ncol = 2))
df <- cbind(c("blah", "blah", "meh", "meh", "meh"), df)
colnames(df) <- c("grouping_variable", "some_var", "some_other_var")
# Group by 1 variable & summarise on rest
df %>% group_by(grouping_variable) %>% summarise_all(sum)
以下是我的想法:
#create data
df = data.frame(type_gite = c('recipient_abandonne', 'recipient_stockage', 'recipient_stockage',
'recipient_stockage', 'recipient_stockage', 'recipient_stockage', 'autres', 'autres',
'recipient_abandonne'),
aegyti_collected = c(19, 0, 8, 36,13,1,0,0,3),
albopictus_collected = c(0,2,0,0,0,3,1,9,0),
total_collected = c(19,2,8,36,13,4,1,9,3))
#Classify as Mixed or only one of species using case when
df$label = case_when(df$albopictus_collected == 0 ~ 'Aegyti Only',
df$aegyti_collected == 0 ~ 'Albopictus Only',
TRUE ~'Mixed')
#frequency table
df = data.frame(rbind(table(df$type_gite, df$label)))
#add column title back in
df = df %>% tibble::rownames_to_column(var = 'type_gite')
#create total column
library(janitor)
df = df %>% adorn_totals("col")
您可以使用dplyr和CONTITOR获取总行,以实现所需的功能:
#install.packages("janitor")
#install.packages("dplyr")
library(dplyr)
df1 %>% select(-total_collected) %>% group_by(type_gite) %>%
mutate(mixed = +(aegyti_collected * albopictus_collected > 0)) %>%
mutate_at(vars(aegyti_collected:albopictus_collected), list(~+(. > 0)*!(mixed))) %>%
summarise_all(sum) %>% janitor::adorn_totals(c("row", "col"))
数据:
由v0.2.1于2019-04-30创建,感谢您的解决方案,但它与我正在寻找的类似,而不是确切的点。例如,对于您的数据帧,您应该记录分组_变量的出现。上面的代码不完全符合我的数据您也需要每个列的总计,因此在结尾do df=df%>%adorn_TotalsRow感谢您的回答,但是代码有一个问题,因为它在select中给出了错误消息error.,-TOTALL:未使用的参数-TOTALL。我不知道这是什么意思。请你解释一下好吗?再次感谢。@ArmelTedjou我怀疑您实际上没有total列,这是您添加到问题中用于说明的内容。select total只会从输出中删除该列,因此如果您实际上没有该列,请删除select total%>%此管道,它将适用于您。此外,列名应与实际数据集匹配!所以如果你有这个列,它叫做total_collected,那么就用它代替total。
#> type_gite aegyti_collected albopictus_collected mixed Total
#> autres 0 2 0 2
#> recipient_abandonne 2 0 0 2
#> recipient_stockage 3 1 1 5
#> Total 5 3 1 9
df1 <- structure(list(type_gite = structure(c(2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 2L),
.Label = c("autres", "recipient_abandonne", "recipient_stockage"),
class = "factor"),
aegyti_collected = c(19, 0, 8, 36, 13, 1, 0, 0, 3),
albopictus_collected = c(0, 2, 0, 0, 0, 3, 1, 9, 0),
total_collected = c(19, 2, 8, 36, 13, 4, 1, 9, 3)),
class = "data.frame", row.names = c(NA, -9L))