Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 构建Sankey图的事务数据_R_Plyr_Sankey Diagram - Fatal编程技术网

R 构建Sankey图的事务数据

R 构建Sankey图的事务数据,r,plyr,sankey-diagram,R,Plyr,Sankey Diagram,Sankey图表有很多包。但是,这些包假定数据已经结构化。我正在查看一个事务数据集,其中我想提取时间序列中的第一个产品序列。假设时间序列已经排序 以下是数据集: structure(list(date = structure(c(1546300800, 1546646400, 1547510400, 1547596800, 1546387200, 1546646400, 1546732800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),

Sankey图表有很多包。但是,这些包假定数据已经结构化。我正在查看一个事务数据集,其中我想提取时间序列中的第一个产品序列。假设时间序列已经排序

以下是数据集:

structure(list(date = structure(c(1546300800, 1546646400, 1547510400, 1547596800, 1546387200, 1546646400, 1546732800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               client = c("a", "a", "a", "a", "b", "b", "b"),
               product = c("butter", "cheese", "cheese", "butter", "milk", "garbage bag", "candy"),
               qty = c(2, 3, 4, 1, 3, 4, 6)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame")) 

以下是所需的输出:

以下是我的建议:

dt <-structure(list(date = structure(c(1546300800, 1546646400, 1547510400, 1547596800, 1546387200, 1546646400, 1546732800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               client = c("a", "a", "a", "a", "b", "b", "b"),
                          product = c("butter", "cheese", "cheese", "butter", "milk", "garbage bag", "candy"),
               qty = c(2, 3, 4, 1, 3, 4, 6)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))

library(data.table)
library(stringr)
dt <- as.data.table(dt)
dt[, From:=shift(product,type = "lag"), by=client]
dt <- dt[!is.na(From)]

setnames(dt, "product", "To")
dt <- dt[From!=To]
setcolorder(dt, c("client", "From", "To", "qty"))
dt[, comp:=paste0(sort(c(From, To)), collapse = "_"), by=seq_len(nrow(dt))]
dt <- unique(dt, by="comp")

dt[, date:=NULL]
dt[, comp:=NULL]

谢谢正确查找不同产品的顺序,这些产品应该很好地输入到Sankey中。查找每个客户的独特流程。在新的示例中,客户b中应该添加“黄油到奶酪”。结构(列表日期=结构(c(15463000800、1546646400、1547510400、1547596800、1546387200、1546646400、1546732800、1546819200、1546992000),类别=c(“POSIXct”、“POSIXt”),tzone=“UTC”),客户=c(“a”、“a”、“a”、“a”、“b”、“b”、“b”),产品=c(“黄油”、“奶酪”、“奶酪”、“牛奶”、“垃圾袋”、“糖果”、“黄油”,“cheese”),qty=c(2,3,4,1,3,4,6,2,3)),row.names=c(NA,-9L),class=c(“tbl_df”,“tbl”,“data.frame”))我更改了唯一语句以实现这一点,并添加了客户机dt
#  client        From          To qty       
#      a      butter      cheese   3 
#      b        milk garbage bag   4 
#      b garbage bag       candy   6