R 如果开始月份不是一月,则分配年份ID
我有一个R 如果开始月份不是一月,则分配年份ID,r,dplyr,R,Dplyr,我有一个df data.frame,由8年的每日值组成 date <- rep(as.Date(seq(as.Date("2001-05-01"), as.Date("2008-04-30"), by= 1), format="%Y-%m-%d"), 3) site <- c(rep("Site_1", 2557), rep("Site_2", 2557), rep("Site_3", 2557)) value <- c(as.num
df data.frame
,由8年的每日值组成
date <- rep(as.Date(seq(as.Date("2001-05-01"),
as.Date("2008-04-30"), by= 1), format="%Y-%m-%d"), 3)
site <- c(rep("Site_1", 2557), rep("Site_2", 2557), rep("Site_3", 2557))
value <- c(as.numeric(sample(90:271, 2557, replace=T)),
as.numeric(sample(125:340, 2557, replace=T)),
as.numeric(sample(70:173, 2557, replace=T)))
df <- data.frame(date, site, value)
它给了我想要的。然而,如果我有30-50年的数据,这需要时间。此外,如果每个新的data.frame
具有不同的开始月份,我需要每次修改ifelse()
以分配年份ID,以便能够按year
分组并执行不同的计算
如果开始月份是一月以外的任何月份,有没有直接的方法来分配yearID?呢
library(dplyr)
df %>%
group_by(year=cut(date, seq(as.Date("2001-05-01"), as.Date("2008-05-01"), "1 year"), include.lowest = TRUE), site) %>%
summarise(sd = sd(value), mean = mean(value))
# Source: local data frame [21 x 4]
# Groups: year [?]
#
# year site sd mean
# (fctr) (fctr) (dbl) (dbl)
# 1 2001-05-01 Site_1 51.82622 182.5890
# 2 2001-05-01 Site_2 63.33385 241.1260
# 3 2001-05-01 Site_3 30.04042 120.1233
# 4 2002-05-01 Site_1 51.66325 182.6658
# 5 2002-05-01 Site_2 62.87470 236.4192
# 6 2002-05-01 Site_3 28.54769 122.2329
# 7 2003-05-01 Site_1 50.97588 179.0874
# 8 2003-05-01 Site_2 63.48810 227.1230
# 9 2003-05-01 Site_3 30.87933 120.4918
# 10 2004-05-01 Site_1 53.19898 176.5589
# .. ... ... ... ...
使用package
lubridate
可以先添加year
列,如下所示:
library(lubridate)
df$year <- ifelse(month(ymd(df$date)) < 5,
paste(year(ymd(df$date))-1, year(ymd(df$date)), sep="-"),
paste(year(ymd(df$date)), year(ymd(df$date))+1, sep="-"))
df %>% dplyr::select(site, year, value) %>%
dplyr::group_by(site, year) %>%
dplyr::summarise_each(funs(
mean(.),
sd(.)
))
Source: local data frame [6 x 4]
Groups: site [1]
site year mean sd
(fctr) (chr) (dbl) (dbl)
1 Site_1 2001-2002 178.2055 54.58277
2 Site_1 2002-2003 176.9342 49.64435
3 Site_1 2003-2004 177.4153 52.20447
4 Site_1 2004-2005 179.5370 52.77848
5 Site_1 2005-2006 180.3671 51.41292
6 Site_1 2006-2007 179.3616 53.02291
库(lubridate)
df$year%dplyr::选择(站点、年份、值)%>%
dplyr::分组依据(地点,年份)%>%
dplyr::总结每个(funs)(
平均值(),
sd(.)
))
来源:本地数据帧[6 x 4]
组别:网站[1]
场地年平均标准差
(fctr)(chr)(dbl)(dbl)
1地盘1 2001-2002 178.2055 54.58277
2地盘1 2002-2003 176.9342 49.64435
3地盘1 2003-2004 177.4153 52.20447
4地盘1 2004-2005 179.5370 52.77848
5 Site_1 2005-2006 180.3671 51.41292
6 Site_1 2006-2007 179.3616 53.02291
感谢Luke抽出时间和帮助谢谢你的时间和帮助
library(lubridate)
df$year <- ifelse(month(ymd(df$date)) < 5,
paste(year(ymd(df$date))-1, year(ymd(df$date)), sep="-"),
paste(year(ymd(df$date)), year(ymd(df$date))+1, sep="-"))
df %>% dplyr::select(site, year, value) %>%
dplyr::group_by(site, year) %>%
dplyr::summarise_each(funs(
mean(.),
sd(.)
))
Source: local data frame [6 x 4]
Groups: site [1]
site year mean sd
(fctr) (chr) (dbl) (dbl)
1 Site_1 2001-2002 178.2055 54.58277
2 Site_1 2002-2003 176.9342 49.64435
3 Site_1 2003-2004 177.4153 52.20447
4 Site_1 2004-2005 179.5370 52.77848
5 Site_1 2005-2006 180.3671 51.41292
6 Site_1 2006-2007 179.3616 53.02291