R 将重复的列集收集到单个列中_R_Data.table_Reshape_Reshape2_Tidyr

R 将重复的列集收集到单个列中

R 将重复的列集收集到单个列中,r,data.table,reshape,reshape2,tidyr,R,Data.table,Reshape,Reshape2,Tidyr,这里已经解决了收集多组列的问题：，但在我的例子中，这些列不是唯一的我有以下数据：输入id问题点数最大点数问题点数最大点数 #>1 1 a 0 3 c 0 5 #>2 b 0 5 d 20 20 第一列是id，然后我有许多重复的列（原始数据集有133列）：问题标识符给出的要点最高分数我想以这种结构结束： expected <- data.frame( id = c(1, 2, 1, 2), question = letters[1:4], points = c(0,

这里已经解决了收集多组列的问题：，但在我的例子中，这些列不是唯一的

我有以下数据：

输入id问题点数最大点数问题点数最大点数
#>1 1 a 0 3 c 0 5
#>2 b 0 5 d 20 20

第一列是id，然后我有许多重复的列（原始数据集有133列）：

问题标识符

给出的要点

最高分数

我想以这种结构结束：

expected <- data.frame(
  id = c(1, 2, 1, 2),
  question = letters[1:4],
  points = c(0, 0, 0, 20),
  max_points = c(3, 5, 5, 20),
  stringsAsFactors = F
)
expected
#>   id question points max_points
#> 1  1        a      0          3
#> 2  2        b      0          5
#> 3  1        c      0          5
#> 4  2        d     20         20

这会给出一个错误：

行（3,9）、（4,10）、（1,7）、（2,8）的标识符重复。

这里已经讨论了这个问题：，但我不知道为什么/如何添加另一个标识符。很可能这不是主要问题，因为我可能应该以不同的方式处理整个问题

我如何解决我的问题，最好是使用

tidyr

或base？我不知道如何使用

data.table

，但如果有一个简单的解决方案，我也会同意的。

试试这个：

do.call(rbind,
        lapply(seq(2, ncol(input), 3), function(i){
          input[, c(1, i:(i + 2))]
              })
        )

#   id question points max_points
# 1  1        a      0          3
# 2  2        b      0          5
# 3  1        c      0          5
# 4  2        d     20         20

您可能需要澄清您希望如何处理ID列，但可能是这样的

runme <- function(word , dat){
     grep( paste0("^" , word , "$") , names(dat)) 
}

l <- mapply( runme ,  unique(names(input)) , list(input) )
l2 <- as.data.frame(l)

output <- data.frame()
for (i in 1:nrow(l2)) output <- rbind( output , input[,  as.numeric(l2[i,])  ])

runme不使用lappy实现相同目标的另一种方法：
我们首先抓取问题、最大分数和分数的所有列，然后分别将每个列和cbind
所有列融合在一起
library(reshape2)

questions <- input[,c(1,c(1:length(names(input)))[names(input)=="question"])]
points <- input[,c(1,c(1:length(names(input)))[names(input)=="points"])]
max_points <- input[,c(1,c(1:length(names(input)))[names(input)=="max_points"])]

questions_m <- melt(questions,id.vars=c("id"),value.name = "questions")[,c(1,3)]
points_m <- melt(points,id.vars=c("id"),value.name = "points")[,3,drop=FALSE]
max_points_m <- melt(max_points,id.vars=c("id"),value.name = "max_points")[,3, drop=FALSE]

res <- cbind(questions_m,points_m, max_points_m)
res
  id questions points max_points
1  1         a      0          3
2  2         b      0          5
3  1         c      0          5
4  2         d     20         20

library（重塑2）
问题在data.table中执行此操作的惯用方法非常简单：
library(data.table)
setDT(input)

res = melt(
  input, 
  id = "id", 
  meas = patterns("question", "^points$", "max_points"), 
  value.name = c("question", "points", "max_points")
)


   id variable question points max_points
1:  1        1        a      0          3
2:  2        1        b      0          5
3:  1        2        c      0          5
4:  2        2        d     20         20

你会得到一个名为“variable”的额外列，但如果需要，你可以在以后用res[，variable:=NULL]
将它去掉。
你的所有问题、最大分数和分数列实际上都是同一个东西吗？也许rbind（输入[，c（1，2:4）]，输入[，c（1，5:7）]？@zx8754正如我所说的，我总共有133列，所以我不想手动操作。是的，我明白，这只是一个提示，索引可以计算。@zx8754也许我需要关于如何操作的进一步提示；）
library(data.table)
setDT(input)

res = melt(
  input, 
  id = "id", 
  meas = patterns("question", "^points$", "max_points"), 
  value.name = c("question", "points", "max_points")
)


   id variable question points max_points
1:  1        1        a      0          3
2:  2        1        b      0          5
3:  1        2        c      0          5
4:  2        2        d     20         20