R 确定发生后删除的用户数
我是一个如下所示的数据库: userId Screen Platform Version 01 first IOS 1.0.1 01 main IOS 1.0.1 02 first Android 1.0.2 03 first IOS 1.0.2 03 main IOS 1.0.2 03 detail IOS 1.0.2 用户ID屏幕平台版本 01第一个IOS 1.0.1 01主IOS 1.0.1 02首款安卓1.0.2 03第一个IOS 1.0.2 03主IOS 1.0.2 03详细信息IOS 1.0.2 基本上,我想知道在第一个屏幕之后有多少人“掉”了,所以我的想法是创建一个新的列,它通过userId告诉用户通过的屏幕数量, 理想的数据库如下所示: userId DifferentScreen Platform Version 01 2 IOS 1.0.1 02 1 Android 1.0.2 03 3 IOS 1.0.2 用户ID差异屏幕平台版本 01 2 IOS 1.0.1 02 1安卓1.0.2 03 3 IOS 1.0.2 我试过: setDT(database)[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L], by = userId)] setDT(数据库)[顺序(用户ID),(差异屏幕=uniqueN(屏幕),版本=Version[1L],平台=Platform[1L],用户=userId)]R 确定发生后删除的用户数,r,R,我是一个如下所示的数据库: userId Screen Platform Version 01 first IOS 1.0.1 01 main IOS 1.0.1 02 first Android 1.0.2 03
但它不起作用,我发现的问题是:它不按userId分组,因为列数保持不变,我使用uniqueN命令,因为我没有找到一个只执行.N()的命令。基本上就是这样。关于缺少括号,只有一个小问题。尝试:
dt[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L]), by = userId]
userId DifferentScreen Version Platform
1: 1 2 1.0.1 IOS
2: 2 1 1.0.2 Android
3: 3 3 1.0.2 IOS
您需要关闭by=userID
前面的括号,而不是后面的括号。通过这种方式,data.table
将by=…
读取为一个by分组,而不是一个名为by
的新变量。目前,您的输出数据集没有按任何方式分组,它认为您需要创建一个名为by
的变量
您可以在旧代码的结果中看到这一点:
dt[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L], by = userId)]
#See how this creates a variable "by"?
DifferentScreen Version Platform by
1: 3 1.0.1 IOS 1
2: 3 1.0.1 IOS 1
3: 3 1.0.1 IOS 2
4: 3 1.0.1 IOS 3
5: 3 1.0.1 IOS 3
6: 3 1.0.1 IOS 3
数据:
dt <- structure(list(userId = c(1L, 1L, 2L, 3L, 3L, 3L), Screen = structure(c(2L,
3L, 2L, 2L, 3L, 1L), .Label = c("detail", "first", "main"), class = "factor"),
Platform = structure(c(2L, 2L, 1L, 2L, 2L, 2L), .Label = c("Android",
"IOS"), class = "factor"), Version = structure(c(1L, 1L,
2L, 2L, 2L, 2L), .Label = c("1.0.1", "1.0.2"), class = "factor")), .Names = c("userId",
"Screen", "Platform", "Version"), class = c("data.table", "data.frame"
), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000250788>)
dt你基本上就在那里。关于缺少括号,只有一个小问题。尝试:
dt[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L]), by = userId]
userId DifferentScreen Version Platform
1: 1 2 1.0.1 IOS
2: 2 1 1.0.2 Android
3: 3 3 1.0.2 IOS
您需要关闭by=userID
前面的括号,而不是后面的括号。通过这种方式,data.table
将by=…
读取为一个by分组,而不是一个名为by
的新变量。目前,您的输出数据集没有按任何方式分组,它认为您需要创建一个名为by
的变量
您可以在旧代码的结果中看到这一点:
dt[order(userId) ,. (DifferentScreen = uniqueN(Screen), Version = Version[1L], Platform = Platform[1L], by = userId)]
#See how this creates a variable "by"?
DifferentScreen Version Platform by
1: 3 1.0.1 IOS 1
2: 3 1.0.1 IOS 1
3: 3 1.0.1 IOS 2
4: 3 1.0.1 IOS 3
5: 3 1.0.1 IOS 3
6: 3 1.0.1 IOS 3
数据:
dt <- structure(list(userId = c(1L, 1L, 2L, 3L, 3L, 3L), Screen = structure(c(2L,
3L, 2L, 2L, 3L, 1L), .Label = c("detail", "first", "main"), class = "factor"),
Platform = structure(c(2L, 2L, 1L, 2L, 2L, 2L), .Label = c("Android",
"IOS"), class = "factor"), Version = structure(c(1L, 1L,
2L, 2L, 2L, 2L), .Label = c("1.0.1", "1.0.2"), class = "factor")), .Names = c("userId",
"Screen", "Platform", "Version"), class = c("data.table", "data.frame"
), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000250788>)
dt谢谢你的提问。我们可以尝试使用这一点,但您最好将数据dput
或以其他方式使其易于复制,以便我们可以使用R中的示例数据。谢谢您的提问。我们可以尝试使用这一点,但您最好将数据dput
或以其他方式使其易于复制,以便我们可以使用R中的示例数据。