使用yearmon()按月份和年份对R中的数据帧进行分组
编辑: 我知道了使用yearmon()按月份和年份对R中的数据帧进行分组,r,dplyr,R,Dplyr,编辑: 我知道了 df_CloseDelta$YearMonth <- as.yearmon(df_CloseDelta$date) df_CloseDelta %>% group_by(stock, YearMonth) %>% summarize(minCloseDelta = min(closeDelta), meanCloseDelta = mean(closeDelta), maxCloseDel
df_CloseDelta$YearMonth <- as.yearmon(df_CloseDelta$date)
df_CloseDelta %>%
group_by(stock, YearMonth) %>%
summarize(minCloseDelta = min(closeDelta),
meanCloseDelta = mean(closeDelta),
maxCloseDelta = max(closeDelta)) -> df_summary_CloseDelta
并返回:
[1] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[8] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[15] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
[22] "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014" "Jan 2014"
等等
然后我尝试将其分组:
df_summary_CloseDelta <- df_CloseDelta %>%
group_by(as.yearmon(df_CloseDelta$date))
我知道有1006个日期,但有5030个条目,因为有五只股票。我试着对它们进行分组,然后找出每个股票每月和每年的平均值、最小值和最大值。有人能给我指出正确的方向吗?
group\u by
希望您为其指定变量名,或与数据中的行数相同的向量,该行数将被视为执行分组的因子。请参见下面的示例
> btest <- data.frame(a = LETTERS[1:10],
+ b = c(1,1,2,2,3,3,4,4,5,5),
+ c = c(rep('e',5), rep('f',5)))
> btest
a b c
1 A 1 e
2 B 1 e
3 C 2 e
4 D 2 e
5 E 3 e
6 F 3 f
7 G 4 f
8 H 4 f
9 I 5 f
10 J 5 f
然而,您的代码认为您正在尝试做的是提供它将用于形成分组的逐行值
> btest %>%
+ group_by(c(1,1,1,1,1,2,2,2,2,2)) %>%
+ summarise(ex = mean(b))
# A tibble: 2 x 2
`c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)` ex
<dbl> <dbl>
1 1.00 1.80
2 2.00 4.20
这里的问题是,您需要先添加要按其分组的列,然后才能按其分组
> df_CloseDelta[['date_yearmon']] <- as.yearmon(df_CloseDelta[['date']])
>
> df_CloseDelta %>%
+ group_by(date_yearmon, stock) %>%
+ summarise(mean_closedelta = mean(closeDelta))
# A tibble: 240 x 3
# Groups: date_yearmon [?]
date_yearmon stock mean_closedelta
<S3: yearmon> <chr> <dbl>
1 Jan 2014 AAPL -0.474
2 Jan 2014 AMZN -0.472
3 Jan 2014 FB 0.746
4 Jan 2014 GOOG 0.310
5 Jan 2014 MSFT 0.104
6 Feb 2014 AAPL 0.269
7 Feb 2014 AMZN 0.0631
8 Feb 2014 FB 0.491
9 Feb 2014 GOOG 0.159
10 Feb 2014 MSFT 0.0713
# ... with 230 more rows
xts有
to.monthly
,它直接转换为monthly,因此假设输入的OHLCV数据位于环境e
中的一组xts对象中,如注释所示,最后我们对e
中的每个此类对象应用一个转换函数(将两者转换为monthly、转换为数据帧并附加符号)然后对得到的数据帧进行rbinding,得到一个数据帧
sym2df <- function(x, env) cbind(Symbol = x, fortify.zoo(to.monthly(env[[x]], name = "")))
do.call("rbind", lapply(ls(e), sym2df, env = e))
> btest %>%
+ group_by(c) %>%
+ summarise(ex = mean(b))
# A tibble: 2 x 2
c ex
<fct> <dbl>
1 e 1.80
2 f 4.20
> btest %>%
+ group_by(c(1,1,1,1,1,2,2,2,2,2)) %>%
+ summarise(ex = mean(b))
# A tibble: 2 x 2
`c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)` ex
<dbl> <dbl>
1 1.00 1.80
2 2.00 4.20
> mean(c(1,1,2,2,3))
[1] 1.8
> mean(c(3,4,4,5,5))
[1] 4.2
> df_CloseDelta[['date_yearmon']] <- as.yearmon(df_CloseDelta[['date']])
>
> df_CloseDelta %>%
+ group_by(date_yearmon, stock) %>%
+ summarise(mean_closedelta = mean(closeDelta))
# A tibble: 240 x 3
# Groups: date_yearmon [?]
date_yearmon stock mean_closedelta
<S3: yearmon> <chr> <dbl>
1 Jan 2014 AAPL -0.474
2 Jan 2014 AMZN -0.472
3 Jan 2014 FB 0.746
4 Jan 2014 GOOG 0.310
5 Jan 2014 MSFT 0.104
6 Feb 2014 AAPL 0.269
7 Feb 2014 AMZN 0.0631
8 Feb 2014 FB 0.491
9 Feb 2014 GOOG 0.159
10 Feb 2014 MSFT 0.0713
# ... with 230 more rows
df_CloseDelta %>%
mutate(date_yearmon = as.character(as.yearmon(date))) %>%
group_by(date_yearmon, stock) %>%
summarise(mean_closedelta = mean(closeDelta))
sym2df <- function(x, env) cbind(Symbol = x, fortify.zoo(to.monthly(env[[x]], name = "")))
do.call("rbind", lapply(ls(e), sym2df, env = e))
library(quantmod)
start <- "2014-01-01"
end <- "2017-12-31"
syms <- c("AAPL", "AMZN", "FB", "GOOG", "MSFT")
getSymbols(syms, from = start, to = end, env = e <- new.env())