需要格式化R数据

需要格式化R数据,r,R,这是我唯一的另一个问题的后续,但希望更直接。我需要的数据如下所示: custID custChannel custDate 1 151 Direct 2015-10-10 00:15:32 2 151 GooglePaid 2015-10-10 00:16:45 3 151 Converted 2015-10-10 00:17:01 4 5655 BingPaid 2015-10-11 00:

这是我唯一的另一个问题的后续,但希望更直接。我需要的数据如下所示:

     custID   custChannel            custDate
1     151        Direct 2015-10-10 00:15:32
2     151    GooglePaid 2015-10-10 00:16:45
3     151     Converted 2015-10-10 00:17:01
4    5655      BingPaid 2015-10-11 00:20:12
5    7855 GoogleOrganic 2015-10-12 00:05:32
6    7862  YahooOrganic 2015-10-13 00:18:20
7    9655    GooglePaid 2015-10-13 00:08:35
8    9655    GooglePaid 2015-10-13 00:11:11
9    9655     Converted 2015-10-13 00:11:35
10   9888    GooglePaid 2015-10-14 00:08:35
11   9888    GooglePaid 2015-10-14 00:11:11
12   9888     Converted 2015-10-14 00:11:35
  Path                                 Path Count
BingPaid                                   1
Direct>GooglePaid>Converted                1
GoogleOrganic                              1
GooglePaid>GooglePaid>Converted            2
YahooOrganic                               1
要排序以使输出如下所示:

     custID   custChannel            custDate
1     151        Direct 2015-10-10 00:15:32
2     151    GooglePaid 2015-10-10 00:16:45
3     151     Converted 2015-10-10 00:17:01
4    5655      BingPaid 2015-10-11 00:20:12
5    7855 GoogleOrganic 2015-10-12 00:05:32
6    7862  YahooOrganic 2015-10-13 00:18:20
7    9655    GooglePaid 2015-10-13 00:08:35
8    9655    GooglePaid 2015-10-13 00:11:11
9    9655     Converted 2015-10-13 00:11:35
10   9888    GooglePaid 2015-10-14 00:08:35
11   9888    GooglePaid 2015-10-14 00:11:11
12   9888     Converted 2015-10-14 00:11:35
  Path                                 Path Count
BingPaid                                   1
Direct>GooglePaid>Converted                1
GoogleOrganic                              1
GooglePaid>GooglePaid>Converted            2
YahooOrganic                               1

其思想是捕获客户路径(由custID标识)并计算整个数据集有多少人选择了该确切路径(路径计数)。我需要对500万行的数据集执行此操作。

使用
数据。table
您可以按如下方式执行此操作:

require(data.table)
setDT(dat)[,paste(custChannel, collapse = ">"), custID][,.("path length"=.N), .(path=V1)]
结果:

                              path path length
1:     Direct>GooglePaid>Converted           1
2:                        BingPaid           1
3:                   GoogleOrganic           1
4:                    YahooOrganic           1
5: GooglePaid>GooglePaid>Converted           2
逐步:

setDT(dat) # make dat a data.table
# get path by custID
dat_path <- dat[,paste(custChannel, collapse = ">"), custID] 
#get length by path created in the previous step
res <- dat_path[,.("path length"=.N), by=.(path=V1)] 
setDT(dat)#使dat成为数据表
#按custID获取路径
达图路