R 使用data.table扩展数据
我有这个数据表R 使用data.table扩展数据,r,data.table,R,Data.table,我有这个数据表 library(data.table) data.table( id = c(rep(1, 3), rep(2, 2)), begin = c(1, 4, 8, 1, 11), end = c(3, 7, 12, 10, 12), state = c("A", "B", "A", "B", "A") ) 我希望得到以下输出: data.table( id = c(1, 2), m1 = c("A", "B"), m2 = c("A", "B"),
library(data.table)
data.table(
id = c(rep(1, 3), rep(2, 2)),
begin = c(1, 4, 8, 1, 11),
end = c(3, 7, 12, 10, 12),
state = c("A", "B", "A", "B", "A")
)
我希望得到以下输出:
data.table(
id = c(1, 2),
m1 = c("A", "B"),
m2 = c("A", "B"),
m3 = c("A", "B"),
m4 = c("B", "B"),
m5 = c("B", "B"),
m6 = c("B", "B"),
m7 = c("B", "B"),
m8 = c("A", "B"),
m9 = c("A", "B"),
m10 = c("A", "B"),
m11 = c("A", "A"),
m12 = c("A", "A")
)
那些曾经做过序列分析的人可能已经认识到我正在尝试做TRaMiNeR软件包中seqformat所做的事情,但由于使用了数据,因此性能更高。table使用tidyverse的解决方案
library(data.table)
data.table(
id = c(rep(1, 3), rep(2, 2)),
begin = c(1, 4, 8, 1, 11),
end = c(3, 7, 12, 10, 12),
state = c("A", "B", "A", "B", "A")
)
data.table的一个选项是在创建序列列后融化数据集,然后按“i1”、“id”、“状态”分组,获取第一个和最后一个“值”的序列,将其从“长”播送到“宽”
dt1 <- melt(dt[, i1 := seq_len(.N)], id.vars = c("i1", "id", "state"))[,
paste0("m", seq(first(value), last(value))), .(i1, id, state)]
dcast(dt1, id ~ V1, value.var = "state")[]
# id m1 m10 m11 m12 m2 m3 m4 m5 m6 m7 m8 m9
#1: 1 A A A A A A B B B B A A
#2: 2 B B A A B B B B B B B B
另一种解决方案:
dt[, unlist(Map(`:`, begin, end)), by = .(id, state)
][, dcast(.SD, id ~ sprintf("m%02d", V1), value.var = "state")]
其中:
以长格式保存数据可能更好。长格式通常更容易在以后的数据处理/分析中使用
您可以通过以下方式实现:
dt[, unlist(Map(`:`, begin, end)), by = .(id, state)][order(id, V1)]
其中:
其中不需要[orderid,V1]-部分
使用数据:
dt <- data.table(
id = c(rep(1, 3), rep(2, 2)),
begin = c(1, 4, 8, 1, 11),
end = c(3, 7, 12, 10, 12),
state = c("A", "B", "A", "B", "A")
)
我怀疑在没有专门的软件包的情况下,将表格改为宽格式是否能获得高性能。但是,如果您愿意/能够处理长格式数据,则有DT[,.seq=seqfirstbegin,lastend,v=inverse.rle.values=state,length=end-begin+1L,by=id]可以替代@Frank的选项:DT[,unlistMap`:`,begin,end,by=.id,state]另请参见我的答案
id state V1
1: 1 A 1
2: 1 A 2
3: 1 A 3
4: 1 B 4
5: 1 B 5
6: 1 B 6
7: 1 B 7
8: 1 A 8
9: 1 A 9
10: 1 A 10
11: 1 A 11
12: 1 A 12
13: 2 B 1
14: 2 B 2
15: 2 B 3
16: 2 B 4
17: 2 B 5
18: 2 B 6
19: 2 B 7
20: 2 B 8
21: 2 B 9
22: 2 B 10
23: 2 A 11
24: 2 A 12
dt <- data.table(
id = c(rep(1, 3), rep(2, 2)),
begin = c(1, 4, 8, 1, 11),
end = c(3, 7, 12, 10, 12),
state = c("A", "B", "A", "B", "A")
)