从R中的频率表（表函数的倒数）创建包含单个试验的表_R_Cross Validation_Frequency

从R中的频率表（表函数的倒数）创建包含单个试验的表

从R中的频率表（表函数的倒数）创建包含单个试验的表,r,cross-validation,frequency,R,Cross Validation,Frequency,我在R中的data.frame中有一个数据频率表，列出了因子级别和成功与失败的计数。我想将其从频率表转换为事件列表，即与“table”命令相反。具体来说，我想谈谈： factor.A factor.B success.count fail.count -------- -------- ------------- ---------- 0 1 0 2 1 1 2 1 为此： factor

我在R中的

data.frame

中有一个数据频率表，列出了因子级别和成功与失败的计数。我想将其从频率表转换为事件列表，即与“table”命令相反。具体来说，我想谈谈：

factor.A factor.B success.count fail.count
-------- -------- ------------- ----------
 0        1        0             2
 1        1        2             1

为此：

factor.A factor.B result 
-------- -------- -------
 0        1        0
 0        1        0
 1        1        1
 1        1        1
 1        1        0

在我看来，

重塑

应该做到这一点，甚至是一些我没有听说过的模糊的基函数，但我没有运气。即使重复

data.frame的单个行

也很棘手-如何将可变数量的参数传递给

rbind

小费

背景：为什么？因为与聚合二项数据相比，交叉验证此类数据集的逻辑拟合更容易

我用一个广义线性模型作为R中的二项式回归来分析我的数据，并希望交叉验证以控制我的数据的正则化，因为我的目的是预测性的

然而，据我所知，R中的默认交叉验证例程对于二项式数据并不适用，只是跳过频率表的整行，而不是单独进行试验。这意味着轻采样和重采样因子组合在我的成本函数中具有相同的权重，这不适合我的数据。

您可以尝试以下方法：

# create 'result' vector
# repeat 1s and 0s the number of times given in the respective 'count' column
result <- rep(rep(c(1, 0), nrow(df)), unlist(df[ , c("success.count", "fail.count")]))

# repeat each row in df the number of times given by the sum of 'count' columns
data.frame(df[rep(1:nrow(df), rowSums(df[ , c("success.count", "fail.count")]) ), c("factor.A", "factor.B")], result)

#     factor.A factor.B result
# 1          0        1      0
# 1.1        0        1      0
# 2          1        1      1
# 2.1        1        1      1
# 2.2        1        1      0

#创建“结果”向量
#按相应“计数”列中给出的次数重复1s和0s
结果试试这个
  x = matrix( c(0, 1, 1, 1, 0 , 2, 2, 1), 2, 4)
  r= c()
  for(i in 1:nrow(x)) {
    r = c(r, rep(c(x[i, 1:2], 1), x[i, 3]))
    r = c(r, rep(c(x[i, 1:2], 0), x[i, 4]))
  }
  t(matrix(r, nrow= 3))

对于tidyverse风格的解决方案，您可以
library(tidyverse)

df %>% gather(key = result, value = incidence, success.count, fail.count) %>% 
     mutate(result = if_else(result %>% str_detect("success"), 1, 0)) %>%
     pmap_dfr(function(factor.A, factor.B, result, incidence) 
                   { tibble(factor.A = factor.A,
                            factor.B = factor.B,
                            result = rep(result, times = incidence)
                            )
                   }
               )

嗯，事实上，现在我想起来了，这里没有足够的统计内容，这可以作为一个普通的编程问题直接进入stackoverflow。没错，但请不要交叉发布。我们将为您迁移此文件。