将向量返回函数应用于data.frame分组(按多个因素)
这是我的数据框的一个示例将向量返回函数应用于data.frame分组(按多个因素),r,aggregate,apply,R,Aggregate,Apply,这是我的数据框的一个示例 charact_fraction pure_charact sample replicate identity 0.08348135 clean An006 1 70 0.078947368 clean An006 1 70 0.090277778 clean An006 1 70 0.044399596 clean An006 2 70 0 clean An006 2 70 0.049
charact_fraction pure_charact sample replicate identity
0.08348135 clean An006 1 70
0.078947368 clean An006 1 70
0.090277778 clean An006 1 70
0.044399596 clean An006 2 70
0 clean An006 2 70
0.049348869 clean An006 2 70
0.218818381 mixed An011 1 70
0.112068966 mixed An011 1 70
1 pure An011 1 70
0 clean An011 2 70
0.214285714 mixed An011 2 70
0.2180937 mixed An011 2 70
我想对bincharacter\u fraction
进行分类,并计算由几个因素组成的bin频率。生成的数据帧应该是这样的
bin_frequency bin sample replicate identity
… 0-0.1 An006 1 70
… … … … …
… 0.9-1.0 An006 1 70
… 0-0.1 An011 1 70
… … … … …
… 0.9-1.0 An011 1 70
… … … … …
我有返回垃圾箱频率的功能
get_freqs <- function(dat_vector, breaks) {
hist(dat_vector, breaks=breaks, include.lowest=TRUE, plot=FALSE)$counts
}
我相信这是我迄今为止最接近的一次拍摄,但显然与预期的输出相差甚远:
with(df, tapply(charact_part, list(sample, replicate, identity), get_freqs, breaks=breaks))
我有非常难看的Python代码来做这件事,但我想在R中有一些更干净和功能性的东西。提前谢谢你
cut
可能是一种方法:
x <- gsub("\\[|\\]|\\(", "", cut(df$charact_fraction, seq(0,1, .1), include.lowest=T))
df$range <- gsub(",", "-", x)
df
# charact_fraction pure_charact sample replicate identity range
# 1 0.08348135 clean An006 1 70 0-0.1
# 2 0.07894737 clean An006 1 70 0-0.1
# 3 0.09027778 clean An006 1 70 0-0.1
# 4 0.04439960 clean An006 2 70 0-0.1
# 5 0.00000000 clean An006 2 70 0-0.1
# 6 0.04934887 clean An006 2 70 0-0.1
# 7 0.21881838 mixed An011 1 70 0.2-0.3
# 8 0.11206897 mixed An011 1 70 0.1-0.2
# 9 1.00000000 pure An011 1 70 0.9-1
# 10 0.00000000 clean An011 2 70 0-0.1
# 11 0.21428571 mixed An011 2 70 0.2-0.3
# 12 0.21809370 mixed An011 2 70 0.2-0.3
x只需使用表格
:
with( dfrm, table( cut( charact_function, breaks=10, include.lowest=TRUE),
sample, replicate, identity) )
您也可以使用breaks=breaks
,但我只是想演示该参数的不同用法。。。稍微紧凑一些
这是一个4向分类,但您可能需要三个双向分类,在这种情况下:
cat_char_func <- cut( charact_function, breaks=10, include.lowest=TRUE)
sapply( dfrm[ , c('sample', 'replicate', 'identity')],
function(cat) { table( cat_char_func, cat) }
)
来自“plyr”的cat\u char\u func组合cut()
和ddply()
应该为您提供一个数据帧,其中包含您感兴趣的各种因素子集的频率。比如:
library(plyr)
df$bin <- cut(df$charact_fraction, seq(0, 1, 0.1), include.lowest=TRUE)
df$obs <- 1 # Makes counting easy in next step
xtabs <- ddply(df, .(bin, sample, replicate, identity), summarise,
frequency = sum(obs))
我猜你是在寻找cut
,虽然我不确定你想要的输出是否真的如此。我不知道如何将其转换为数据。frame
,如我的问题所示。你的答案中有很多很酷的东西。多谢各位。
cat_char_func <- cut( charact_function, breaks=10, include.lowest=TRUE)
sapply( dfrm[ , c('sample', 'replicate', 'identity')],
function(cat) { table( cat_char_func, cat) }
)
library(plyr)
df$bin <- cut(df$charact_fraction, seq(0, 1, 0.1), include.lowest=TRUE)
df$obs <- 1 # Makes counting easy in next step
xtabs <- ddply(df, .(bin, sample, replicate, identity), summarise,
frequency = sum(obs))
xtabs.grid <- with(df, expand.grid(bin = unique(bins), sample = unique(sample),
replicate = unique(replicate), identity = unique(identity)))
xtabs.full <- merge(xtabs.grid, xtabs, all.x = TRUE)
xtabs.full[is.na(xtabs.full)] <- 0
df2 <- df %>%
mutate(bin = cut(charact_fraction, seq(0, 1, 0.1), include.lowest=TRUE)) %>%
count(bin, sample, replicate, identity) %>%
left_join(with(df, expand.grid(bin=levels(cut(charact_fraction, seq(0, 1, 0.1), include.lowest=TRUE)), sample=unique(sample), replicate=unique(replicate), identity=unique(identity))), .) %>%
mutate(n = ifelse(is.na(n)==FALSE, n, 0))