R 自动计算数据帧的摘要统计信息并创建新表
我有以下数据帧:R 自动计算数据帧的摘要统计信息并创建新表,r,dplyr,R,Dplyr,我有以下数据帧: col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov", "chi","avi","chi","chi","bov","bov","fox","avi","bov","chi") col2 <- c("low","med","high","high","low","low","med","med","med","high", "low","low","hi
col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
"chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
"low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)
test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)
阻力百分比列基于上述col3,其中1=阻力,0=非阻力。我尝试了以下方法:
library(dplyr)
test_data<-test_data %>%
count(col1,col2,col3) %>%
group_by(col1, col2) %>%
mutate(perc_res = prop.table(n)*100)
binom.test(resistant samples,total samples)$conf.int*100
然而,我不知道如何与其他人一起实施它。
有没有一种简单快捷的方法可以做到这一点?应该这样做
library(tidyverse)
library(broom)
test_data %>%
mutate(col3 = ifelse(col3 == 0, "NonResistant", "Resistant")) %>%
count(col1, col2, col3) %>%
spread(col3, n, fill = 0) %>%
mutate(PercentResistant = Resistant / (NonResistant + Resistant)) %>%
mutate(test = map2(Resistant, NonResistant, ~ binom.test(.x, .x + .y) %>% tidy())) %>%
unnest() %>%
transmute(Species = col1, Pop.density = col2, PercentResistant, CI_low = conf.low * 100, CI_high = conf.high * 100, TotalSamples = Resistant + NonResistant)
test
的嵌套框架中library(data.table)
setDT(DT)
DT[, {
bt <- binom.test(sum(resists), .N)$conf.int*100
.(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, keyby=.(species, popdens)]
species popdens res_rate res_lo res_hi n
1: avi low 0.00000 0.000000 70.75982 3
2: avi med 0.00000 0.000000 97.50000 1
3: bov low 100.00000 15.811388 100.00000 2
4: bov med 50.00000 1.257912 98.74209 2
5: bov high 100.00000 15.811388 100.00000 2
6: chi low 0.00000 0.000000 97.50000 1
7: chi med 50.00000 1.257912 98.74209 2
8: chi high 66.66667 9.429932 99.15962 3
9: fox low 0.00000 0.000000 97.50000 1
10: fox med 50.00000 1.257912 98.74209 2
我建议先使用group_by,然后使用summarise函数。使用
data.frame(col1,col2,col3)
,而不是cbind
,这会强制每个列在此处字符串。示例数据没有(“avi”,“high”)对。您是否希望该行以任何方式显示(使用NAs和零样本计数)?如果它们不存在,我不需要它们显示。伟大的解决方案!我能问一下conf.low
是从哪里来的吗unest()
我只看到estimate
和statistic
?@PoGibas:conf.low来自tidy()
,然后是unest
ed。如果你看到estimate,它应该在那里。胡乱猜测,您的窗口没有那么宽,结果下面有“…多X个变量”?aaa,它是tbl_df
,%>%data.frame()
显示它。无法习惯tibble,这太棒了!哪一部分来自“扫帚”包装?它是最新的/嵌套和转换吗?@Haakonkas:broom使用tidy()
方法将模型转换为数据帧。
library(data.table)
setDT(DT)
DT[, {
bt <- binom.test(sum(resists), .N)$conf.int*100
.(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, keyby=.(species, popdens)]
species popdens res_rate res_lo res_hi n
1: avi low 0.00000 0.000000 70.75982 3
2: avi med 0.00000 0.000000 97.50000 1
3: bov low 100.00000 15.811388 100.00000 2
4: bov med 50.00000 1.257912 98.74209 2
5: bov high 100.00000 15.811388 100.00000 2
6: chi low 0.00000 0.000000 97.50000 1
7: chi med 50.00000 1.257912 98.74209 2
8: chi high 66.66667 9.429932 99.15962 3
9: fox low 0.00000 0.000000 97.50000 1
10: fox med 50.00000 1.257912 98.74209 2
DT[CJ(species = species, popdens = popdens, unique = TRUE), on=.(species, popdens), {
bt <-
if (.N > 0L) binom.test(sum(resists), .N)$conf.int*100
else NA_real_
.(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, by=.EACHI]
species popdens res_rate res_lo res_hi n
1: avi low 0.00000 0.000000 70.75982 3
2: avi med 0.00000 0.000000 97.50000 1
3: avi high NA NA NA 0
4: bov low 100.00000 15.811388 100.00000 2
5: bov med 50.00000 1.257912 98.74209 2
6: bov high 100.00000 15.811388 100.00000 2
7: chi low 0.00000 0.000000 97.50000 1
8: chi med 50.00000 1.257912 98.74209 2
9: chi high 66.66667 9.429932 99.15962 3
10: fox low 0.00000 0.000000 97.50000 1
11: fox med 50.00000 1.257912 98.74209 2
12: fox high NA NA NA 0
DT = data.frame(
species = col1,
popdens = factor(col2, levels=c("low", "med", "high")),
resists = col3
)