R:收集重复的列
样本数据:R:收集重复的列,r,R,样本数据: df1 <- structure(list(Name = structure(c(3L, 2L, 1L), .Label = c("Bob", "Joe", "Mike"), class = "factor"), Location = structure(c(1L, 1L, 2L), .Label = c("CA", "WA"), class = "factor"), Title = structure(c(2L, 3L, 1L), .Label = c("CEO
df1 <- structure(list(Name = structure(c(3L, 2L, 1L), .Label = c("Bob",
"Joe", "Mike"), class = "factor"), Location = structure(c(1L,
1L, 2L), .Label = c("CA", "WA"), class = "factor"), Title = structure(c(2L,
3L, 1L), .Label = c("CEO", "Manager", "VP"), class = "factor"),
Class = structure(c(1L, 2L, 2L), .Label = c("Class1", "Class2"
), class = "factor"), Month = c(1, 2, 3), Class.1 = structure(c(3L,
2L, 1L), .Label = c("Class1", "Class2", "Class4"), class = "factor"),
Month.1 = c(3, 3, 2), Objective = structure(1:3, .Label = c("Obj1",
"Obj2", "Obj3"), class = "factor"), Month.2 = c(2, 7, 7),
Category = c("x", "y", "z"), Objective.1 = structure(c(3L,
2L, 1L), .Label = c("Obj1", "Obj7", "Obj9"), class = "factor"),
Month.3 = c(4, 5, 5), Category2 = c("z", "r", "q")), .Names = c("Name",
"Location", "Title", "Class", "Month", "Class.1", "Month.1",
"Objective", "Month.2", "Category", "Objective.1", "Month.3",
"Category2"), class = "data.frame", row.names = c(NA, -3L))
Name Location Title Class Month Class.1 Month.1 Objective Month.2 Category Objective.1 Month.3 Category2
1 Mike CA Manager Class1 1 Class4 3 Obj1 2 x Obj9 4 z
2 Joe CA VP Class2 2 Class2 3 Obj2 7 y Obj7 5 r
3 Bob WA CEO Class2 3 Class1 2 Obj3 7 z Obj1 5 q
我用聚集
,分散
等在堆栈上尝试了一些类似的例子,但我不知道如何将班级月份组和目标月份组保持在一起
在我的真实数据集中,有100个列和8个ID列。而不仅仅是班级月或目标月对,上半部分列是四人一组,下半部分列是八人一组。四人小组的一个例子是班级月成本日期
Mike的示例输出:
Name Location Title Variable Value Value.2
1 Mike CA Manager Class1 1 <NA>
2 Mike CA Manager Class4 3 <NA>
3 Mike CA Manager Obj1 2 x
4 Mike CA Manager Obj9 4 z
名称位置标题变量值。2
1迈克CA经理1 1 1
2迈克CA经理4级3
3麦克CA经理Obj1 2 x
4 Mike CA经理Obj9 4 z
重复值是可以的,但您需要指定哪些值组合在一起(在示例中为“类”和“目标”),以获得OP的输出:
library(data.table)
melt(setDT(df1),
meas = patterns("Class|Objective", "Month", "Category")
)[order(Name)]
Name Location Title variable value1 value2 value3
1: Bob WA CEO 1 Class2 3 z
2: Bob WA CEO 2 Class1 2 q
3: Bob WA CEO 3 Obj3 7 NA
4: Bob WA CEO 4 Obj1 5 NA
5: Joe CA VP 1 Class2 2 y
6: Joe CA VP 2 Class2 3 r
7: Joe CA VP 3 Obj2 7 NA
8: Joe CA VP 4 Obj7 5 NA
9: Mike CA Manager 1 Class1 1 x
10: Mike CA Manager 2 Class4 3 z
11: Mike CA Manager 3 Obj1 2 NA
12: Mike CA Manager 4 Obj9 4 NA
如果列名重复相同,或者使用check.names=TRUE
来消除歧义,这无关紧要,因为patterns
只匹配名称中的模式。有关如何在需要时指定模式的更多信息,请参见?regex
melt
(请参阅?melt.data.table
)的其他参数可用于为结果中的列提供自定义名称(而不是“value1”、“value2”和…)。对,因此对于Mike,您将有4行,变量=c(Class1、Class4、Obj1、Obj9)
值=c(1,3,2,4)能否将一些示例行添加到预期输出中?您是否可以控制非唯一列命名,或者这是这个问题的唯一原因?@r2evans为Mike添加了示例输出。我当然可以设置check.names=TRUE
读取数据,但我不知道这是否有帮助。如果使用check.names=TRUE
,您将拥有Class.1
、Class.2
,等等,但它们仍然需要在一列中结束。每个“组”是否总是这样列的开头是类
或目标
?太棒了,这就行了。这需要一点工作,以清理它与完整的100列,但它把一切都正确地结合在一起。谢谢
library(data.table)
melt(setDT(df1),
meas = patterns("Class|Objective", "Month", "Category")
)[order(Name)]
Name Location Title variable value1 value2 value3
1: Bob WA CEO 1 Class2 3 z
2: Bob WA CEO 2 Class1 2 q
3: Bob WA CEO 3 Obj3 7 NA
4: Bob WA CEO 4 Obj1 5 NA
5: Joe CA VP 1 Class2 2 y
6: Joe CA VP 2 Class2 3 r
7: Joe CA VP 3 Obj2 7 NA
8: Joe CA VP 4 Obj7 5 NA
9: Mike CA Manager 1 Class1 1 x
10: Mike CA Manager 2 Class4 3 z
11: Mike CA Manager 3 Obj1 2 NA
12: Mike CA Manager 4 Obj9 4 NA