R 计数组中的编号,但对于非连续日期重新启动
我有如下数据:R 计数组中的编号,但对于非连续日期重新启动,r,R,我有如下数据: sample <- data.frame( group = c("A","A","A","B","B","B"), date = c(as.Date("2014-12-31"), as.Date("2015-01-31"), as.Date("2015-02-28"), as.Date("2015-01-31"), as.Date("2015-03-31"),
sample <- data.frame(
group = c("A","A","A","B","B","B"),
date = c(as.Date("2014-12-31"),
as.Date("2015-01-31"),
as.Date("2015-02-28"),
as.Date("2015-01-31"),
as.Date("2015-03-31"),
as.Date("2015-04-30")),
obs = c(100, 200, 300, 50, 100, 150)
)
我想创建一个第四列,计算组中观察的数量。但是,如果一个月没有紧接着前一个月,我希望重新开始计数。这就是我希望它看起来的样子:
group date obs num
1 A 2014-12-31 100 1
2 A 2015-01-31 200 2
3 A 2015-02-28 300 3
4 B 2015-01-31 50 1
5 B 2015-03-31 100 1
6 B 2015-04-30 150 2
到目前为止,我只能得到以下信息:
library(tidyverse)
sample <- sample %>%
arrange(date) %>%
group_by(group) %>%
mutate(num = row_number())
group date obs num
1 A 2014-12-31 100 1
2 A 2015-01-31 200 2
3 A 2015-02-28 300 3
4 B 2015-01-31 50 1
5 B 2015-03-31 100 2
6 B 2015-04-30 150 3
任何帮助都将不胜感激。我也希望能够做同样的事情,但是使用季度数据而不是月度数据。我们可以根据“日期”月份的差异创建一个组,如果不等于1,即一个月的差异
library(dplyr)
library(lubridate)
sample %>%
arrange(group, date) %>%
group_by(group, mth = cumsum(c(TRUE, diff(month(date)) != 1))) %>%
mutate(num = row_number()) %>%
ungroup %>%
select(-mth)
# A tibble: 6 x 4
# group date obs num
# <fct> <date> <dbl> <int>
#1 A 2015-01-31 100 1
#2 A 2015-02-28 200 2
#3 A 2015-03-31 300 3
#4 B 2015-01-31 50 1
#5 B 2015-03-31 100 1
#6 B 2015-04-30 150 2
如果年度也需要考虑
library(zoo)
sample %>%
arrange(group, date) %>%
mutate(yearmon = as.yearmon(date)) %>%
group_by(group) %>%
group_by(grp = cumsum(c(TRUE, as.integer(diff(yearmon) * 12)> 1)),
add = TRUE ) %>%
mutate(num = row_number()) %>%
ungroup %>%
select(-grp, -yearmon)
# A tibble: 6 x 4
# group date obs num
# <fct> <date> <dbl> <int>
#1 A 2015-01-31 100 1
#2 A 2015-02-28 200 2
#3 A 2015-03-31 300 3
#4 B 2015-01-31 50 1
#5 B 2015-03-31 100 1
#6 B 2015-04-30 150 2
我们可以使用lubridate::days_in_month来获取一个月的天数,并将其与当前日期和过去日期的差值进行比较,以创建一个新组。然后我们可以在每个组中分配行号
library(dplyr)
sample %>%
group_by(group) %>%
mutate(diff_days = cumsum(as.numeric(date - lag(date, default = first(date))) !=
lubridate::days_in_month(date))) %>%
group_by(diff_days, add = TRUE) %>%
mutate(num = row_number()) %>%
ungroup() %>%
select(-diff_days)
# group date obs num
# <fct> <date> <dbl> <int>
#1 A 2014-12-31 100 1
#2 A 2015-01-31 200 2
#3 A 2015-02-28 300 3
#4 B 2015-01-31 50 1
#5 B 2015-03-31 100 1
#6 B 2015-04-30 150 2
非常感谢。我意识到我的样本数据过于简化了。并非所有日期都在同一年内。我改变了最初的评论以反映这一点。@H.Z当你说月差时,它是与每天的绝对差异,还是只考虑了年和月
library(dplyr)
sample %>%
group_by(group) %>%
mutate(diff_days = cumsum(as.numeric(date - lag(date, default = first(date))) !=
lubridate::days_in_month(date))) %>%
group_by(diff_days, add = TRUE) %>%
mutate(num = row_number()) %>%
ungroup() %>%
select(-diff_days)
# group date obs num
# <fct> <date> <dbl> <int>
#1 A 2014-12-31 100 1
#2 A 2015-01-31 200 2
#3 A 2015-02-28 300 3
#4 B 2015-01-31 50 1
#5 B 2015-03-31 100 1
#6 B 2015-04-30 150 2