需要格式化R数据
这是我唯一的另一个问题的后续,但希望更直接。我需要的数据如下所示:需要格式化R数据,r,R,这是我唯一的另一个问题的后续,但希望更直接。我需要的数据如下所示: custID custChannel custDate 1 151 Direct 2015-10-10 00:15:32 2 151 GooglePaid 2015-10-10 00:16:45 3 151 Converted 2015-10-10 00:17:01 4 5655 BingPaid 2015-10-11 00:
custID custChannel custDate
1 151 Direct 2015-10-10 00:15:32
2 151 GooglePaid 2015-10-10 00:16:45
3 151 Converted 2015-10-10 00:17:01
4 5655 BingPaid 2015-10-11 00:20:12
5 7855 GoogleOrganic 2015-10-12 00:05:32
6 7862 YahooOrganic 2015-10-13 00:18:20
7 9655 GooglePaid 2015-10-13 00:08:35
8 9655 GooglePaid 2015-10-13 00:11:11
9 9655 Converted 2015-10-13 00:11:35
10 9888 GooglePaid 2015-10-14 00:08:35
11 9888 GooglePaid 2015-10-14 00:11:11
12 9888 Converted 2015-10-14 00:11:35
Path Path Count
BingPaid 1
Direct>GooglePaid>Converted 1
GoogleOrganic 1
GooglePaid>GooglePaid>Converted 2
YahooOrganic 1
要排序以使输出如下所示:
custID custChannel custDate
1 151 Direct 2015-10-10 00:15:32
2 151 GooglePaid 2015-10-10 00:16:45
3 151 Converted 2015-10-10 00:17:01
4 5655 BingPaid 2015-10-11 00:20:12
5 7855 GoogleOrganic 2015-10-12 00:05:32
6 7862 YahooOrganic 2015-10-13 00:18:20
7 9655 GooglePaid 2015-10-13 00:08:35
8 9655 GooglePaid 2015-10-13 00:11:11
9 9655 Converted 2015-10-13 00:11:35
10 9888 GooglePaid 2015-10-14 00:08:35
11 9888 GooglePaid 2015-10-14 00:11:11
12 9888 Converted 2015-10-14 00:11:35
Path Path Count
BingPaid 1
Direct>GooglePaid>Converted 1
GoogleOrganic 1
GooglePaid>GooglePaid>Converted 2
YahooOrganic 1
其思想是捕获客户路径(由custID标识)并计算整个数据集有多少人选择了该确切路径(路径计数)。我需要对500万行的数据集执行此操作。使用
数据。table
您可以按如下方式执行此操作:
require(data.table)
setDT(dat)[,paste(custChannel, collapse = ">"), custID][,.("path length"=.N), .(path=V1)]
结果:
path path length
1: Direct>GooglePaid>Converted 1
2: BingPaid 1
3: GoogleOrganic 1
4: YahooOrganic 1
5: GooglePaid>GooglePaid>Converted 2
逐步:
setDT(dat) # make dat a data.table
# get path by custID
dat_path <- dat[,paste(custChannel, collapse = ">"), custID]
#get length by path created in the previous step
res <- dat_path[,.("path length"=.N), by=.(path=V1)]
setDT(dat)#使dat成为数据表
#按custID获取路径
达图路