R 使用多个键和值重塑数据帧
我有一个宽的数据帧,我想转换成一个长的数据帧 这不是我正在使用的实际宽数据帧。每门课程都有更多的课程和更多的“值”,因此数据框架比这要宽得多。并非每门课程都有与之相关的所有值列(因此,为什么bio1Csem不在下面的数据框中) 我决定在较小的数据帧上尝试该解决方案,因为在较大的数据帧上遇到了很多问题)。不幸的是,我仍在挣扎 我正在使用的数据帧:R 使用多个键和值重塑数据帧,r,R,我有一个宽的数据帧,我想转换成一个长的数据帧 这不是我正在使用的实际宽数据帧。每门课程都有更多的课程和更多的“值”,因此数据框架比这要宽得多。并非每门课程都有与之相关的所有值列(因此,为什么bio1Csem不在下面的数据框中) 我决定在较小的数据帧上尝试该解决方案,因为在较大的数据帧上遇到了很多问题)。不幸的是,我仍在挣扎 我正在使用的数据帧: >X = rbind( c( "1", "2.5","3.7","","2006 Fall","2007 Fall","Smith","Hu","
>X = rbind( c( "1", "2.5","3.7","","2006 Fall","2007 Fall","Smith","Hu",""),
c( "2" ,"3.7", "3.7", "3.5", "2007 Spring", "2007 Fall",
"Smith","Hu","Langdon"), c("3" ,"4", "3.2", "4", "2007 Spring", "2007 Fall",
"Smith","Hu","Langdon"))
> colnames(X) = c('id','bio1Agrade','bio1Bgrade','bio1Cgrade','bio1Asem',
'bio1Bsem','bio1Aprof', 'bio1Bprof','bio1Cprof')
> X
id bio1Agrade bio1Bgrade bio1Cgrade bio1Asem bio1Bsem bio1Aprof bio1Bprof bio1Cprof
[1,] "1" "2.5" "3.7" "" "2006 Fall" "2007 Fall" "Smith" "Hu" ""
[2,] "2" "3.7" "3.7" "3.5" "2007 Spring" "2007 Fall" "Smith" "Hu" "Langdon"
[3,] "3" "4" "3.2" "4" "2007 Spring" "2007 Fall" "Smith" "Hu" "Langdon"
我希望它看起来像这样:
id course grade semester prof
1 bio1A 2.5 2006 Fall Smith
1 bio1B 3.7 2007 Fall Hu
1 bio1C
2 bio1A 3.7 2007 Spring Smith
2 bio1B 3.7 2007 Fall Hu
2 bio1C 3.5 Langdon
3 bio1A 4 2007 Spring Smith
3 bio1B 3.2 2007 Fall Hu
3 bio1C 4 Langdon
我认为重塑不起作用,因为我所有的列名都是没有任何明显分隔符的字符,并且不是所有的课程都有3个与之对应的列
我还想过尝试使用tidyr解决方案,我正在努力如何将它用于多个值
你们有谁对如何解决这个问题有什么建议吗?重命名列、为“缺少”列的课程添加空列并使用“重塑”会更容易吗?还有其他更简单的方法吗?希望这有帮助
库(dplyr)
图书馆(tidyr)
df%>%
聚集(温度列,值,-id)%>%
突变(课程=gsub((.*)(等级为sem-prof)”,“\\1”,临时课程),
列名称=gsub((.*)(等级为sem专业),“\\2”,临时列))%>%
选择(-temp\u col)%>%
排列(列名称、值)
输出为:
id course grade prof sem
1 1 bio1A 2.5 Smith 2006 Fall
2 1 bio1B 3.7 Hu 2007 Fall
3 1 bio1C <NA> <NA>
4 2 bio1A 3.7 Smith 2007 Spring
5 2 bio1B 3.7 Hu 2007 Fall
6 2 bio1C 3.5 Langdon <NA>
7 3 bio1A 4 Smith 2007 Spring
8 3 bio1B 3.2 Hu 2007 Fall
9 3 bio1C 4 Langdon <NA>
df <- structure(list(id = 1:3, bio1Agrade = c(2.5, 3.7, 4), bio1Bgrade = c(3.7,
3.7, 3.2), bio1Cgrade = c(NA, 3.5, 4), bio1Asem = c("2006 Fall",
"2007 Spring", "2007 Spring"), bio1Bsem = c("2007 Fall", "2007 Fall",
"2007 Fall"), bio1Aprof = c("Smith", "Smith", "Smith"), bio1Bprof = c("Hu",
"Hu", "Hu"), bio1Cprof = c("", "Langdon", "Langdon")), .Names = c("id",
"bio1Agrade", "bio1Bgrade", "bio1Cgrade", "bio1Asem", "bio1Bsem",
"bio1Aprof", "bio1Bprof", "bio1Cprof"), class = "data.frame", row.names = c(NA,
-3L))
id课程成绩教授sem
1生物多样性2.5史密斯2006年秋季
2 1 bio1B 3.7 Hu 2007年秋季
3.1生物多样性
4 2 bio1A 3.7 Smith 2007年春季
5 2 bio1B 3.7 Hu 2007年秋季
6 2 bio1C 3.5兰登
7 3 bio1A 4 Smith 2007年春季
8 3 bio1B 3.2 Hu 2007年秋季
9 3生物多样性4兰登
样本数据:
id course grade prof sem
1 1 bio1A 2.5 Smith 2006 Fall
2 1 bio1B 3.7 Hu 2007 Fall
3 1 bio1C <NA> <NA>
4 2 bio1A 3.7 Smith 2007 Spring
5 2 bio1B 3.7 Hu 2007 Fall
6 2 bio1C 3.5 Langdon <NA>
7 3 bio1A 4 Smith 2007 Spring
8 3 bio1B 3.2 Hu 2007 Fall
9 3 bio1C 4 Langdon <NA>
df <- structure(list(id = 1:3, bio1Agrade = c(2.5, 3.7, 4), bio1Bgrade = c(3.7,
3.7, 3.2), bio1Cgrade = c(NA, 3.5, 4), bio1Asem = c("2006 Fall",
"2007 Spring", "2007 Spring"), bio1Bsem = c("2007 Fall", "2007 Fall",
"2007 Fall"), bio1Aprof = c("Smith", "Smith", "Smith"), bio1Bprof = c("Hu",
"Hu", "Hu"), bio1Cprof = c("", "Langdon", "Langdon")), .Names = c("id",
"bio1Agrade", "bio1Bgrade", "bio1Cgrade", "bio1Asem", "bio1Bsem",
"bio1Aprof", "bio1Bprof", "bio1Cprof"), class = "data.frame", row.names = c(NA,
-3L))
df我们可以使用数据表中的melt
来实现这一点。表
可以采取多种测量
模式
library(data.table)
nm1 <- substr(names(df)[-1], 1, 5)
melt(setDT(df), measure = patterns("grade$", "prof$", "sem$"),
value.name = c("grade", "prof", "sem"),
variable.name = "course")[, course := nm1[course]][order(id)]
# id course grade prof sem
#1: 1 bio1A 2.5 Smith 2006 Fall
#2: 1 bio1B 3.7 Hu 2007 Fall
#3: 1 bio1C NA NA
#4: 2 bio1A 3.7 Smith 2007 Spring
#5: 2 bio1B 3.7 Hu 2007 Fall
#6: 2 bio1C 3.5 Langdon NA
#7: 3 bio1A 4.0 Smith 2007 Spring
#8: 3 bio1B 3.2 Hu 2007 Fall
#9: 3 bio1C 4.0 Langdon NA
库(data.table)
nm1出于兴趣,这个结果可以通过tidyr::gather
实现吗?@Dom我认为另一个答案是通过gather
@Dan实现的,如果你的问题是gather
是否具有与melt
类似的功能,而不是此时