R 将两个变量与传统行分组
我有一个看起来与此类似的数据帧:R 将两个变量与传统行分组,r,R,我有一个看起来与此类似的数据帧: date uid duration 1 29.03.2020 0zOs6ZS9 1 2 29.03.2020 0zOs6ZS9 5 3 29.03.2020 0zOs6ZS9 2 4 31.03.2020 0zOs6ZS9 6 5 01.04.2020 0zOs6ZS9 7 6 01.04.2020 0zOs6ZS9 4 7 29.03.
date uid duration
1 29.03.2020 0zOs6ZS9 1
2 29.03.2020 0zOs6ZS9 5
3 29.03.2020 0zOs6ZS9 2
4 31.03.2020 0zOs6ZS9 6
5 01.04.2020 0zOs6ZS9 7
6 01.04.2020 0zOs6ZS9 4
7 29.03.2020 0zOs6ZS9 3
8 29.03.2020 3jtMiD 2
9 30.03.2020 3jtMiD 7
10 30.03.2020 3jtMiD 5
11 31.03.2020 3jtMiD 1
12 02.04.2020 3jtMiD 2
我的目标是对日期和uid的持续时间求和,但如果有一个用户的度量值,而不是另一个用户的度量值,我实际上希望此行的持续时间为0
因此,与我通过
df2 <- df1 %>%
group_by (date,uid) %>%
summarise(duration =sum(duration, na.rm = TRUE))
date uid duration
<date> <chr> <dbl>
1 2020-03-29 0zOs6ZS9 11
2 2020-03-29 3jtMiD 2
3 2020-03-30 3jtMiD 12
4 2020-03-31 0zOs6ZS9 6
5 2020-03-31 3jtMiD 1
6 2020-04-01 0zOs6ZS9 11
7 2020-04-02 3jtMiD 2
我怎样才能做到这一点?我们可以使用
complete
library(tidyr)
library(dplyr)
df2 %>%
ungroup %>%
complete(date, uid, fill = list(duration = 0))
-输出
# A tibble: 10 x 3
# date uid duration
# <date> <chr> <dbl>
# 1 2020-03-29 0zOs6ZS9 11
# 2 2020-03-29 3jtMiD 2
# 3 2020-03-30 0zOs6ZS9 0
# 4 2020-03-30 3jtMiD 12
# 5 2020-03-31 0zOs6ZS9 6
# 6 2020-03-31 3jtMiD 1
# 7 2020-04-01 0zOs6ZS9 11
# 8 2020-04-01 3jtMiD 0
# 9 2020-04-02 0zOs6ZS9 0
#10 2020-04-02 3jtMiD 2
#一个tible:10 x 3
#日期uid持续时间
#
#1 2020-03-29 0zOs6ZS9 11
#2 2020-03-29第三季度中期报告2
#3 2020-03-30 0zOs6ZS9 0
#4 2020-03-30 3 JTMID 12
#5 2020-03-31 0zOs6ZS9 6
#6 2020-03-31第三季度中期报告1
#7 2020-04-01 0zOs6ZS9 11
#8 2020-04-01第三季度中期报告
#9 2020-04-02 0zOs6ZS9 0
#10 2020-04-02 3jtMiD 2
数据
df2
# A tibble: 10 x 3
# date uid duration
# <date> <chr> <dbl>
# 1 2020-03-29 0zOs6ZS9 11
# 2 2020-03-29 3jtMiD 2
# 3 2020-03-30 0zOs6ZS9 0
# 4 2020-03-30 3jtMiD 12
# 5 2020-03-31 0zOs6ZS9 6
# 6 2020-03-31 3jtMiD 1
# 7 2020-04-01 0zOs6ZS9 11
# 8 2020-04-01 3jtMiD 0
# 9 2020-04-02 0zOs6ZS9 0
#10 2020-04-02 3jtMiD 2
df2 <- structure(list(date = structure(c(18350, 18350, 18351, 18352,
18352, 18353, 18354), class = "Date"), uid = c("0zOs6ZS9", "3jtMiD",
"3jtMiD", "0zOs6ZS9", "3jtMiD", "0zOs6ZS9", "3jtMiD"), duration = c(11L,
2L, 12L, 6L, 1L, 11L, 2L)), row.names = c("1", "2", "3", "4",
"5", "6", "7"), class = "data.frame")