如何实现R中多列的countifs()函数(计算文本值)
我想按用户分组,并计算订单时间类型为“Daily”和“Night”的数量,分别以“Daily”和“Night”两列表示,按用户分组如何实现R中多列的countifs()函数(计算文本值),r,text,dplyr,countif,R,Text,Dplyr,Countif,我想按用户分组,并计算订单时间类型为“Daily”和“Night”的数量,分别以“Daily”和“Night”两列表示,按用户分组 user_id order_hour_type order_day_type 1 daytime weekend 1 daytime weekday 1 daytime weekday 1 daytime week
user_id order_hour_type order_day_type
1 daytime weekend
1 daytime weekday
1 daytime weekday
1 daytime weekend
2 evening weekday
2 evening weekday
2 evening weekend
2 daytime weekday
3 daytime weekday
3 evening weekday
3 daytime weekday
结果应该是这样的:
user_id daytime evening weekend weekday
1 4 0 2 2
2 1 3 1 3
3 2 1 0 3
我已尝试将包dplyr
与以下代码一起使用:
(以增加“日间”栏为例)
如何才能产生预期的结果?非常感谢 一个选项是将
收集成“长”格式,然后对列进行计数,然后将其分散回“宽”
library(dplyr)
library(tidyr)
gather(df1, key, val, -user_id) %>%
count(user_id, val) %>%
spread(val, n, fill = 0)
# A tibble: 3 x 5
# user_id daytime evening weekday weekend
# <int> <dbl> <dbl> <dbl> <dbl>
#1 1 4 0 2 2
#2 2 1 3 3 1
#3 3 2 1 3 0
base R
选项是按其他列的数量复制第一列,同时取消列出其他列,并使用表
library(data.table)
dcast(melt(setDT(df1), id.var = 'user_id'), user_id ~ value, length)
table(rep(df1[,1], 2), unlist(df1[-1]))
数据
df1非常感谢!但实际上,我的数据源是一个超过100000行(60000个用户)的csv文件。在这种情况下,代码是否可用?@fridaguo如果内存对您来说不是一个约束,那么它应该可以工作警告消息:在melt.data.table(setDT(df),id.var=“user\u id”):“measure.vars”[order\u hour\u type,order\u hour\u of\u day]不是所有类型。按照层次结构的顺序,熔融数据值列将为“字符”类型。所有非“character”类型的度量变量也将被强制。查看?melt.data.table中的详细信息,了解有关强制的更多信息。是否需要将df中的值转换为字符?@fridaguo这不是错误,这是一条警告消息,表明两列的类型不同。这不是一个问题。如果要删除警告,请将列转换为字符
即第2列和第3列,即df[2:3]
library(data.table)
dcast(melt(setDT(df1), id.var = 'user_id'), user_id ~ value, length)
table(rep(df1[,1], 2), unlist(df1[-1]))
df1 <- structure(list(user_id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L), order_hour_type = c("daytime", "daytime", "daytime",
"daytime", "evening", "evening", "evening", "daytime", "daytime",
"evening", "daytime"), order_day_type = c("weekend", "weekday",
"weekday", "weekend", "weekday", "weekday", "weekend", "weekday",
"weekday", "weekday", "weekday")), class = "data.frame",
row.names = c(NA,
-11L))