在R中重塑数据帧:从宽到长,但';变化的';列的长度不相等
我的问题在下面的代码中描述。我已经在这里和其他论坛上寻找过类似的问题,但还没有找到一个完全符合我这里要求的解决方案。如果仅仅依靠基本R就可以解决这个问题,那将是更好的选择,但是使用包也可以在R中重塑数据帧:从宽到长,但';变化的';列的长度不相等,r,R,我的问题在下面的代码中描述。我已经在这里和其他论坛上寻找过类似的问题,但还没有找到一个完全符合我这里要求的解决方案。如果仅仅依靠基本R就可以解决这个问题,那将是更好的选择,但是使用包也可以 id1 <- c("A", "A", "A", "B", "B", "C", "C", "C") id2 <- c(10, 20, 30, 10, 30, 10, 20, 30) x.1 <- ceiling(runif(8)*80) + 20 y.1 <- ceiling(runif
id1 <- c("A", "A", "A", "B", "B", "C", "C", "C")
id2 <- c(10, 20, 30, 10, 30, 10, 20, 30)
x.1 <- ceiling(runif(8)*80) + 20
y.1 <- ceiling(runif(8)*15) + 200
x.2 <- ceiling(runif(8)*90) + 20
y.2 <- ceiling(runif(8)*20) + 200
x.3 <- ceiling(runif(8)*80) + 40
# The data frame contains to kinds of data values, x and y, repeated by a suffix number. In my example both
# the id-part and the data-part are not structured in a completely uniform manner.
mywidedata <- data.frame(id1, id2, x.1, y.1, x.2, y.2, x.3)
# If I wanted to make the data frame even wider, this would work. It generates NAs for the missing combination (B,20).
reshape(mywidedata, idvar = "id1", timevar = "id2", direction = "wide")
# What I want is "long", and this fails.
reshape(mywidedata, varying = c(3:7), direction = "long")
# I could introduce the needed column. This works.
mywidecopy <- mywidedata
mywidecopy$y.3 <- NA
mylongdata <- reshape(mywidecopy, idvar=c(1,2), varying = c(3:8), direction = "long", sep = ".")
# (sep-argument not needed in this case - the function can figure out the system)
names(mylongdata)[(names(mylongdata)=="time")] <- "id3"
# I want to reach the same outcome without manual manipulation. Is it possible with the just the
# built-in 'reshape'?
# Trying 'melt'. Not what I want.
reshape::melt(mywidedata, id.vars = c(1,2))
id1您可以从tidyr
使用pivot\u更长的时间
:
tidyr::pivot_longer(mywidedata,
cols = -c(id1, id2),
names_to = c('.value', 'id3'),
names_sep = '\\.')
# A tibble: 24 x 5
# id1 id2 id3 x y
# <chr> <dbl> <chr> <dbl> <dbl>
# 1 A 10 1 66 208
# 2 A 10 2 95 220
# 3 A 10 3 89 NA
# 4 A 20 1 34 208
# 5 A 20 2 81 219
# 6 A 20 3 82 NA
# 7 A 30 1 23 201
# 8 A 30 2 80 204
# 9 A 30 3 75 NA
#10 B 10 1 52 210
# … with 14 more rows
tidyr::pivot_更长(mywidedata,
cols=-c(id1,id2),
名称_to=c('.value',id3'),
名称(sep=“\\”)
#一个tibble:24x5
#id1 id2 id3 x y
#
#1A10166208
#2 A 10 2 95 220
#3 A 10 3 89 NA
#4 A 20 1 34 208
#5 A 20 2 81 219
#6 A 20 3 82 NA
#7 A 30 1 23 201
#8 A 30 2 80 204
#9 A 30 3 75 NA
#10B1152210
#…还有14行
仅cbind
缺少的级别为NA
reshape(cbind(mywidedata, y.2=NA), varying=3:8, direction="long")
# id1 id2 time x y id
# 1.1 A 10 1 98 215 1
# 2.1 A 20 1 38 208 2
# 3.1 A 30 1 97 205 3
# 4.1 B 10 1 61 207 4
# 5.1 B 30 1 73 201 5
# 6.1 C 10 1 96 202 6
# 7.1 C 20 1 100 202 7
# 8.1 C 30 1 94 202 8
# 1.2 A 10 2 73 208 1
# 2.2 A 20 2 69 218 2
# 3.2 A 30 2 64 219 3
# 4.2 B 10 2 104 213 4
# 5.2 B 30 2 99 203 5
# 6.2 C 10 2 92 206 6
# 7.2 C 20 2 49 206 7
# 8.2 C 30 2 59 209 8
# 1.3 A 10 3 63 208 1
# 2.3 A 20 3 91 218 2
# 3.3 A 30 3 42 219 3
# 4.3 B 10 3 67 213 4
# 5.3 B 30 3 90 203 5
# 6.3 C 10 3 74 206 6
# 7.3 C 20 3 86 206 7
# 8.3 C 30 3 83 209 8
我们可以从数据表中使用melt
library(data.table)
melt(setDT(mywidedata), measure = patterns("^x", "^y"), value.name = c('x', 'y'))
# id1 id2 variable x y
# 1: A 10 1 97 215
# 2: A 20 1 75 202
# 3: A 30 1 87 213
# 4: B 10 1 51 206
# 5: B 30 1 75 203
# 6: C 10 1 41 210
# 7: C 20 1 58 211
# 8: C 30 1 50 207
# 9: A 10 2 92 204
#10: A 20 2 60 207
#11: A 30 2 35 201
#12: B 10 2 83 202
#13: B 30 2 81 202
#14: C 10 2 55 216
#15: C 20 2 68 204
#16: C 30 2 70 218
#17: A 10 3 89 NA
#18: A 20 3 108 NA
#19: A 30 3 47 NA
#20: B 10 3 78 NA
#21: B 30 3 43 NA
#22: C 10 3 106 NA
#23: C 20 3 92 NA
#24: C 30 3 96 NA
重塑
功能对宽变长时的不平衡数据有限制。您唯一的选择是使用另一个软件包,如tidyr或添加缺少的列。@Edward适时地指出,谢谢,很好。而且似乎没有办法使用软件包。答案已通过。我忘了单击。完成。