将向量返回函数应用于data.frame分组(按多个因素)

将向量返回函数应用于data.frame分组(按多个因素),r,aggregate,apply,R,Aggregate,Apply,这是我的数据框的一个示例 charact_fraction pure_charact sample replicate identity 0.08348135 clean An006 1 70 0.078947368 clean An006 1 70 0.090277778 clean An006 1 70 0.044399596 clean An006 2 70 0 clean An006 2 70 0.049

这是我的数据框的一个示例

charact_fraction    pure_charact    sample  replicate   identity
0.08348135  clean   An006   1   70
0.078947368 clean   An006   1   70
0.090277778 clean   An006   1   70
0.044399596 clean   An006   2   70
0   clean   An006   2   70
0.049348869 clean   An006   2   70
0.218818381 mixed   An011   1   70
0.112068966 mixed   An011   1   70
1   pure    An011   1   70
0   clean   An011   2   70
0.214285714 mixed   An011   2   70
0.2180937   mixed   An011   2   70
我想对bin
character\u fraction
进行分类,并计算由几个因素组成的bin频率。生成的数据帧应该是这样的

bin_frequency   bin sample  replicate   identity
…   0-0.1   An006   1   70
…   …   …   …   …
…   0.9-1.0 An006   1   70
…   0-0.1   An011   1   70
…   …   …   …   …
…   0.9-1.0 An011   1   70
…   …   …   …   …
我有返回垃圾箱频率的功能

get_freqs <- function(dat_vector, breaks) {
    hist(dat_vector, breaks=breaks, include.lowest=TRUE, plot=FALSE)$counts
}
我相信这是我迄今为止最接近的一次拍摄,但显然与预期的输出相差甚远:

with(df, tapply(charact_part, list(sample, replicate, identity), get_freqs, breaks=breaks))

我有非常难看的Python代码来做这件事,但我想在R中有一些更干净和功能性的东西。提前谢谢你

cut
可能是一种方法:

x <- gsub("\\[|\\]|\\(", "", cut(df$charact_fraction, seq(0,1, .1), include.lowest=T))
df$range <- gsub(",", "-", x)
df
#    charact_fraction pure_charact sample replicate identity   range
# 1        0.08348135        clean  An006         1       70   0-0.1
# 2        0.07894737        clean  An006         1       70   0-0.1
# 3        0.09027778        clean  An006         1       70   0-0.1
# 4        0.04439960        clean  An006         2       70   0-0.1
# 5        0.00000000        clean  An006         2       70   0-0.1
# 6        0.04934887        clean  An006         2       70   0-0.1
# 7        0.21881838        mixed  An011         1       70 0.2-0.3
# 8        0.11206897        mixed  An011         1       70 0.1-0.2
# 9        1.00000000         pure  An011         1       70   0.9-1
# 10       0.00000000        clean  An011         2       70   0-0.1
# 11       0.21428571        mixed  An011         2       70 0.2-0.3
# 12       0.21809370        mixed  An011         2       70 0.2-0.3

x只需使用
表格

with( dfrm, table( cut( charact_function, breaks=10, include.lowest=TRUE),
       sample, replicate, identity) )
您也可以使用
breaks=breaks
,但我只是想演示该参数的不同用法。。。稍微紧凑一些

这是一个4向分类,但您可能需要三个双向分类,在这种情况下:

cat_char_func <- cut( charact_function, breaks=10, include.lowest=TRUE)
sapply( dfrm[ , c('sample', 'replicate', 'identity')], 
                    function(cat) { table( cat_char_func, cat) }
        )          
来自“plyr”的
cat\u char\u func组合
cut()
ddply()
应该为您提供一个数据帧,其中包含您感兴趣的各种因素子集的频率。比如:

library(plyr)
df$bin <- cut(df$charact_fraction, seq(0, 1, 0.1), include.lowest=TRUE)
df$obs <- 1  # Makes counting easy in next step
xtabs <- ddply(df, .(bin, sample, replicate, identity), summarise,
    frequency = sum(obs))

我猜你是在寻找
cut
,虽然我不确定你想要的输出是否真的如此。我不知道如何将其转换为
数据。frame
,如我的问题所示。你的答案中有很多很酷的东西。多谢各位。
cat_char_func <- cut( charact_function, breaks=10, include.lowest=TRUE)
sapply( dfrm[ , c('sample', 'replicate', 'identity')], 
                    function(cat) { table( cat_char_func, cat) }
        )          
library(plyr)
df$bin <- cut(df$charact_fraction, seq(0, 1, 0.1), include.lowest=TRUE)
df$obs <- 1  # Makes counting easy in next step
xtabs <- ddply(df, .(bin, sample, replicate, identity), summarise,
    frequency = sum(obs))
xtabs.grid <- with(df, expand.grid(bin = unique(bins), sample = unique(sample),
  replicate = unique(replicate), identity = unique(identity)))
xtabs.full <- merge(xtabs.grid, xtabs, all.x = TRUE)
xtabs.full[is.na(xtabs.full)] <- 0
df2 <- df %>%
  mutate(bin = cut(charact_fraction, seq(0, 1, 0.1), include.lowest=TRUE)) %>%
  count(bin, sample, replicate, identity) %>%
  left_join(with(df, expand.grid(bin=levels(cut(charact_fraction, seq(0, 1, 0.1), include.lowest=TRUE)), sample=unique(sample), replicate=unique(replicate), identity=unique(identity))), .) %>%
  mutate(n = ifelse(is.na(n)==FALSE, n, 0))