Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用data.table扩展数据_R_Data.table - Fatal编程技术网

R 使用data.table扩展数据

R 使用data.table扩展数据,r,data.table,R,Data.table,我有这个数据表 library(data.table) data.table( id = c(rep(1, 3), rep(2, 2)), begin = c(1, 4, 8, 1, 11), end = c(3, 7, 12, 10, 12), state = c("A", "B", "A", "B", "A") ) 我希望得到以下输出: data.table( id = c(1, 2), m1 = c("A", "B"), m2 = c("A", "B"),

我有这个数据表

library(data.table)

data.table(
  id = c(rep(1, 3), rep(2, 2)),
  begin = c(1, 4, 8, 1, 11),
  end = c(3, 7, 12, 10, 12),
  state = c("A", "B", "A", "B", "A")
)
我希望得到以下输出:

data.table(
  id = c(1, 2),
  m1 = c("A", "B"),
  m2 = c("A", "B"),
  m3 = c("A", "B"),
  m4 = c("B", "B"),
  m5 = c("B", "B"),
  m6 = c("B", "B"),
  m7 = c("B", "B"),
  m8 = c("A", "B"),
  m9 = c("A", "B"),
  m10 = c("A", "B"),
  m11 = c("A", "A"),
  m12 = c("A", "A")
)
那些曾经做过序列分析的人可能已经认识到我正在尝试做TRaMiNeR软件包中seqformat所做的事情,但由于使用了数据,因此性能更高。table

使用tidyverse的解决方案

library(data.table)

data.table(
  id = c(rep(1, 3), rep(2, 2)),
  begin = c(1, 4, 8, 1, 11),
  end = c(3, 7, 12, 10, 12),
  state = c("A", "B", "A", "B", "A")
)
data.table的一个选项是在创建序列列后融化数据集,然后按“i1”、“id”、“状态”分组,获取第一个和最后一个“值”的序列,将其从“长”播送到“宽”

dt1 <- melt(dt[, i1 := seq_len(.N)], id.vars = c("i1", "id", "state"))[,
      paste0("m", seq(first(value), last(value))), .(i1, id, state)]
dcast(dt1, id ~ V1, value.var = "state")[]
#    id m1 m10 m11 m12 m2 m3 m4 m5 m6 m7 m8 m9
#1:  1  A   A   A   A  A  A  B  B  B  B  A  A
#2:  2  B   B   A   A  B  B  B  B  B  B  B  B

另一种解决方案:

dt[, unlist(Map(`:`, begin, end)), by = .(id, state)
   ][, dcast(.SD, id ~ sprintf("m%02d", V1), value.var = "state")]
其中:

以长格式保存数据可能更好。长格式通常更容易在以后的数据处理/分析中使用

您可以通过以下方式实现:

dt[, unlist(Map(`:`, begin, end)), by = .(id, state)][order(id, V1)]
其中:

其中不需要[orderid,V1]-部分

使用数据:

dt <- data.table(
  id = c(rep(1, 3), rep(2, 2)),
  begin = c(1, 4, 8, 1, 11),
  end = c(3, 7, 12, 10, 12),
  state = c("A", "B", "A", "B", "A")
)

我怀疑在没有专门的软件包的情况下,将表格改为宽格式是否能获得高性能。但是,如果您愿意/能够处理长格式数据,则有DT[,.seq=seqfirstbegin,lastend,v=inverse.rle.values=state,length=end-begin+1L,by=id]可以替代@Frank的选项:DT[,unlistMap`:`,begin,end,by=.id,state]另请参见我的答案
    id state V1
 1:  1     A  1
 2:  1     A  2
 3:  1     A  3
 4:  1     B  4
 5:  1     B  5
 6:  1     B  6
 7:  1     B  7
 8:  1     A  8
 9:  1     A  9
10:  1     A 10
11:  1     A 11
12:  1     A 12
13:  2     B  1
14:  2     B  2
15:  2     B  3
16:  2     B  4
17:  2     B  5
18:  2     B  6
19:  2     B  7
20:  2     B  8
21:  2     B  9
22:  2     B 10
23:  2     A 11
24:  2     A 12
dt <- data.table(
  id = c(rep(1, 3), rep(2, 2)),
  begin = c(1, 4, 8, 1, 11),
  end = c(3, 7, 12, 10, 12),
  state = c("A", "B", "A", "B", "A")
)