如何计算r中两年的移动平均数
我有一个关于并购(M&a)的大数据框架(900k行) df有四列:日期(并购完成时)、目标国(哪个国家的公司被合并/收购)、收购国(哪个国家的公司是收购者)和大公司(无论收购人是否为大公司,其中TRUE表示该公司为大公司) 以下是我的df示例:如何计算r中两年的移动平均数,r,average,moving-average,rolling-computation,R,Average,Moving Average,Rolling Computation,我有一个关于并购(M&a)的大数据框架(900k行) df有四列:日期(并购完成时)、目标国(哪个国家的公司被合并/收购)、收购国(哪个国家的公司是收购者)和大公司(无论收购人是否为大公司,其中TRUE表示该公司为大公司) 以下是我的df示例: > df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L, 2002L, 2002L), target_nation = c("Uganda&
> df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L,
2002L, 2002L), target_nation = c("Uganda", "Uganda", "Uganda",
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda"), acquiror_nation = c("France",
"Germany", "France", "France", "Germany", "France", "France",
"Germany"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE, FALSE,
TRUE, TRUE, TRUE)), row.names = c(NA, -8L))
> df
date target_nation acquiror_nation big_corp_TF
1: 2000 Uganda France TRUE
2: 2000 Uganda Germany FALSE
3: 2001 Uganda France TRUE
4: 2001 Uganda France FALSE
5: 2001 Uganda Germany FALSE
6: 2002 Uganda France TRUE
7: 2002 Uganda France TRUE
8: 2002 Uganda Germany TRUE
请注意,2000年的份额将保持不变,因为没有上一年使其成为两年平均值;2001年将变为0.4(因为(1+1)/(2+3)=0.4);2002年将变为0.5(因为(1+2)/(3+3)=0.5)
你对如何编写一个计算两年平均份额的代码有什么想法吗?我想我需要在这里使用for循环,但我不知道如何编写。如果有任何建议,我们将不胜感激
--
编辑:AnilGoyal的代码与示例数据完美结合,但我的实际数据显然更混乱,因此我想知道是否有解决我遇到的问题的方法
我的实际数据集有时跳过一年,有时不包括前几行中包含的采集国。请查看我的实际数据的更准确样本:
> df_new <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L,
2002L, 2002L, 2003L, 2003L, 2004L, 2004L, 2004L, 2006L, 2006L
), target_nation = c("Uganda", "Uganda", "Uganda", "Uganda",
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Uganda",
"Uganda", "Uganda", "Uganda", "Uganda"), acquiror_nation = c("France",
"Germany", "France", "France", "Germany", "France", "France",
"Germany", "Germany", "Germany", "France", "France", "Germany",
"France", "France"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE, FALSE,
TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE)), row.names = c(NA,
-15L))
> df_new
date target_nation acquiror_nation big_corp_TF
1: 2000 Uganda France TRUE
2: 2000 Uganda Germany FALSE
3: 2001 Uganda France TRUE
4: 2001 Uganda France FALSE
5: 2001 Uganda Germany FALSE
6: 2002 Uganda France TRUE
7: 2002 Uganda France TRUE
8: 2002 Uganda Germany TRUE
9: 2003 Uganda Germany TRUE
10: 2003 Uganda Germany FALSE
11: 2004 Uganda France TRUE
12: 2004 Uganda France FALSE
13: 2004 Uganda Germany TRUE
14: 2006 Uganda France TRUE
15: 2006 Uganda France TRUE
>df_新建df_新建
日期目标国家收购人国家大公司
1:2000乌干达-法国-真实
2:2000乌干达-德国假
3:2001乌干达-法国-真实
4:2001乌干达-法国假
5:2001乌干达-德国假
6:2002乌干达-法国-真实
7:2002乌干达-法国-真实
8:2002乌干达德国真实
9:2003乌干达-德国-真实
10:2003乌干达-德国假
11:2004乌干达-法国-真实
12:2004乌干达-法国假
13:2004乌干达德国真实
14:2006乌干达-法国-真实
15:2006乌干达-法国-真实
注:2003年法国没有争吵;2005年也没有
如果我运行Anil的第一个代码,结果如下:
date target_nation acquiror_nation n1 n2 share
<int> <chr> <chr> <dbl> <int> <dbl>
1 2000 Uganda France 2 1 0.5
2 2001 Uganda France 3 1 0.4
3 2002 Uganda France 3 2 0.5
4 2004 Uganda France 3 1 0.5
5 2006 Uganda France 2 2 0.6
date target_nation acquiror_nation n1 n2 share
<int> <chr> <chr> <dbl> <int> <dbl>
1 2000 Uganda France 2 1 0.5
2 2001 Uganda France 3 1 0.4
3 2002 Uganda France 3 2 0.5
4 2003 Uganda France 2 0 0.4
5 2004 Uganda France 3 1 0.2
6 2005 Uganda France 0 0 0.33
7 2006 Uganda France 2 2 1.0
date target_nation acquiror_nation n1 n2 share
<dbl> <chr> <chr> <dbl> <int> <dbl>
1 1999 Mozambique France 1 0 0
2 1999 Mozambique Germany 1 0 0
3 1999 Uganda France 0 0 0
4 1999 Uganda Germany 0 0 0
5 2000 Mozambique France 0 0 0
6 2000 Mozambique Germany 0 0 0
7 2000 Uganda France 2 1 0.25
8 2000 Uganda Germany 2 0 0.167
9 2001 Mozambique France 1 1 0.4
10 2001 Mozambique Germany 1 0 0.333
11 2001 Uganda France 3 1 0.333
12 2001 Uganda Germany 3 0 0.25
13 2002 Mozambique France 2 0 0.2
14 2002 Mozambique Germany 2 1 0.25
15 2002 Uganda France 0 0 0.25
16 2002 Uganda Germany 0 0 0.25
17 2003 Mozambique France 0 0 0.25
18 2003 Mozambique Germany 0 0 0.25
19 2003 Uganda France 2 0 0.167
20 2003 Uganda Germany 2 1 0.25
日期目标国家收购人国家n1 n2份额
1 2000乌干达法国2 1 0.5
2 2001乌干达法国3 1 0.4
3 2002乌干达法国3 2 0.5
4 2004乌干达法国3 1 0.5
5 2006乌干达-法国2 0.6
注:法国2003年和2005年没有结果;我希望有2003年和2005年的结果(因为我们计算的是两年平均数,因此我们应该能够得到2003年和2005年的结果)。此外,2006年的份额实际上是不正确的,因为它应该是1(它应该取2005年的值(0)而不是用2004年的数值来计算平均值)
我希望能够收到以下tibble:
date target_nation acquiror_nation n1 n2 share
<int> <chr> <chr> <dbl> <int> <dbl>
1 2000 Uganda France 2 1 0.5
2 2001 Uganda France 3 1 0.4
3 2002 Uganda France 3 2 0.5
4 2004 Uganda France 3 1 0.5
5 2006 Uganda France 2 2 0.6
date target_nation acquiror_nation n1 n2 share
<int> <chr> <chr> <dbl> <int> <dbl>
1 2000 Uganda France 2 1 0.5
2 2001 Uganda France 3 1 0.4
3 2002 Uganda France 3 2 0.5
4 2003 Uganda France 2 0 0.4
5 2004 Uganda France 3 1 0.2
6 2005 Uganda France 0 0 0.33
7 2006 Uganda France 2 2 1.0
date target_nation acquiror_nation n1 n2 share
<dbl> <chr> <chr> <dbl> <int> <dbl>
1 1999 Mozambique France 1 0 0
2 1999 Mozambique Germany 1 0 0
3 1999 Uganda France 0 0 0
4 1999 Uganda Germany 0 0 0
5 2000 Mozambique France 0 0 0
6 2000 Mozambique Germany 0 0 0
7 2000 Uganda France 2 1 0.25
8 2000 Uganda Germany 2 0 0.167
9 2001 Mozambique France 1 1 0.4
10 2001 Mozambique Germany 1 0 0.333
11 2001 Uganda France 3 1 0.333
12 2001 Uganda Germany 3 0 0.25
13 2002 Mozambique France 2 0 0.2
14 2002 Mozambique Germany 2 1 0.25
15 2002 Uganda France 0 0 0.25
16 2002 Uganda Germany 0 0 0.25
17 2003 Mozambique France 0 0 0.25
18 2003 Mozambique Germany 0 0 0.25
19 2003 Uganda France 2 0 0.167
20 2003 Uganda Germany 2 1 0.25
日期目标国家收购人国家n1 n2份额
1 2000乌干达法国2 1 0.5
2 2001乌干达法国3 1 0.4
3 2002乌干达法国3 2 0.5
4 2003乌干达法国2 0 0.4
5 2004乌干达-法国3 1 0.2
6 2005乌干达-法国0.33
7 2006乌干达-法国2 1.0
注:请注意,2006年的结果也不同(因为我们现在以2005年而不是2004年为两年平均值)
你认为有可能找到一种方法来输出所需的TIB吗?我知道这是原始数据的一个问题:它只是缺少某些数据点。但是,将它们包含到原始数据集中似乎非常不方便;可能最好在中途包含它们,例如,在计算n1和n2之后。但是,什么是原始数据最方便的方法是什么
EDIT2:Anil的新代码可以很好地处理上述数据样本,但在处理更复杂的数据样本(包括多个目标国家)时遇到了不希望出现的问题。下面是一个更短但更复杂的数据样本:
> df_new_complex <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2003L,
2003L, 1999L, 2001L, 2002L, 2002L), target_nation = c("Uganda",
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Mozambique",
"Mozambique", "Mozambique", "Mozambique"), acquiror_nation = c("France",
"Germany", "France", "France", "Germany", "Germany", "Germany",
"Germany", "France", "France", "Germany"), big_corp_TF = c(TRUE,
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE
)), row.names = c(NA, -11L))
> df_new_complex
date target_nation acquiror_nation big_corp_TF
1: 2000 Uganda France TRUE
2: 2000 Uganda Germany FALSE
3: 2001 Uganda France TRUE
4: 2001 Uganda France FALSE
5: 2001 Uganda Germany FALSE
6: 2003 Uganda Germany TRUE
7: 2003 Uganda Germany FALSE
8: 1999 Mozambique Germany FALSE
9: 2001 Mozambique France TRUE
10: 2002 Mozambique France FALSE
11: 2002 Mozambique Germany TRUE
>df_new_complex df_new_complex
日期目标国家收购人国家大公司
1:2000乌干达-法国-真实
2:2000乌干达-德国假
3:2001乌干达-法国-真实
4:2001乌干达-法国假
5:2001乌干达-德国假
6:2003乌干达-德国-真实
7:2003乌干达-德国假
8:1999莫桑比克德国假
9:2001莫桑比克法国真实
10:2002莫桑比克法国假
11:2002莫桑比克德国真实
如您所见,此数据示例包括两个目标国家。Anil的代码,其中param%
突变(d=1)%>%。。。
#我对另一个目标国家也是如此
更正2%
过滤器(目标国家==“乌干达”)%>%
突变(d=1)%>%。。。
#然后我使用rbind
正对
目标国家收购人国家n1 n2股份的日期
1 1999莫桑比克法国1 0 0
2 1999莫桑比克德国1 0 0
3 2000莫桑比克法国0 0
4 2000莫桑比克德国0 0
5 2001莫桑比克法国1
6 2001莫桑比克德国1 0 0
7 2002莫桑比克法国2 0 0.33
8 2002莫桑比克德国2 1 0.333
9 2000乌干达-法国2 1 0.5
correct1 <- df_new_complex %>%
filter(target_nation == "Mozambique") %>%
mutate(d = 1) %>% ...
#I do the same for another target_nation
correct2 <- df_new_complex %>%
filter(target_nation == "Uganda") %>%
mutate(d = 1) %>% ...
#I then use rbind
correct <- rbind(correct1, correct2)
#which produces the desired tibble (without a year 2003 for Mozambique and 1999 for Uganda).
> correct
date target_nation acquiror_nation n1 n2 share
<dbl> <chr> <chr> <dbl> <int> <dbl>
1 1999 Mozambique France 1 0 0
2 1999 Mozambique Germany 1 0 0
3 2000 Mozambique France 0 0 0
4 2000 Mozambique Germany 0 0 0
5 2001 Mozambique France 1 1 1
6 2001 Mozambique Germany 1 0 0
7 2002 Mozambique France 2 0 0.33
8 2002 Mozambique Germany 2 1 0.333
9 2000 Uganda France 2 1 0.5
10 2000 Uganda Germany 2 0 0.25
11 2001 Uganda France 3 1 0.286
12 2001 Uganda Germany 3 0 0.2
13 2002 Uganda France 0 0 0.167
14 2002 Uganda Germany 0 0 0.167
15 2003 Uganda France 2 0 0
16 2003 Uganda Germany 2 1 0.25
param <- 'France'
df_new %>%
mutate(d = 1) %>%
complete(date = seq(min(date), max(date), 1), nesting(target_nation, acquiror_nation),
fill = list(d =0, big_corp_TF = FALSE)) %>%
group_by(date, target_nation) %>%
mutate(n1 = sum(d)) %>%
group_by(date, target_nation, acquiror_nation) %>%
summarise(n1 = mean(n1),
n2 = sum(big_corp_TF), .groups = 'drop') %>%
filter(acquiror_nation == param) %>%
mutate(share = sum_run(n2, k=2, idx = date)/sum_run(n1, k=2, idx = date))
# A tibble: 7 x 6
date target_nation acquiror_nation n1 n2 share
<dbl> <chr> <chr> <dbl> <int> <dbl>
1 2000 Uganda France 2 1 0.5
2 2001 Uganda France 3 1 0.4
3 2002 Uganda France 3 2 0.5
4 2003 Uganda France 2 0 0.4
5 2004 Uganda France 3 1 0.2
6 2005 Uganda France 0 0 0.333
7 2006 Uganda France 2 2 1
df_new %>%
mutate(d = 1) %>%
complete(date = seq(min(date), max(date), 1), nesting(target_nation, acquiror_nation),
fill = list(d =0, big_corp_TF = FALSE)) %>%
group_by(date, target_nation) %>%
mutate(n1 = sum(d)) %>%
group_by(date, target_nation, acquiror_nation) %>%
summarise(n1 = mean(n1),
n2 = sum(big_corp_TF), .groups = 'drop') %>%
group_by(acquiror_nation) %>%
mutate(share = sum_run(n2, k=2, idx = date)/sum_run(n1, k=2, idx = date))
# A tibble: 14 x 6
# Groups: acquiror_nation [2]
date target_nation acquiror_nation n1 n2 share
<dbl> <chr> <chr> <dbl> <int> <dbl>
1 2000 Uganda France 2 1 0.5
2 2000 Uganda Germany 2 0 0
3 2001 Uganda France 3 1 0.4
4 2001 Uganda Germany 3 0 0
5 2002 Uganda France 3 2 0.5
6 2002 Uganda Germany 3 1 0.167
7 2003 Uganda France 2 0 0.4
8 2003 Uganda Germany 2 1 0.4
9 2004 Uganda France 3 1 0.2
10 2004 Uganda Germany 3 1 0.4
11 2005 Uganda France 0 0 0.333
12 2005 Uganda Germany 0 0 0.333
13 2006 Uganda France 2 2 1
14 2006 Uganda Germany 2 0 0
df_new_complex %>%
mutate(d = 1) %>%
group_by(target_nation) %>%
complete(date = seq(min(date), max(date), 1), nesting(acquiror_nation),
fill = list(d =0, big_corp_TF = FALSE)) %>%
group_by(date, target_nation) %>%
mutate(n1 = sum(d)) %>%
group_by(date, target_nation, acquiror_nation) %>%
summarise(n1 = mean(n1),
n2 = sum(big_corp_TF), .groups = 'drop') %>%
group_by(acquiror_nation) %>%
mutate(share = sum_run(n2, k=2)/sum_run(n1, k=2))
# A tibble: 16 x 6
# Groups: acquiror_nation [2]
date target_nation acquiror_nation n1 n2 share
<dbl> <chr> <chr> <dbl> <int> <dbl>
1 1999 Mozambique France 1 0 0
2 1999 Mozambique Germany 1 0 0
3 2000 Mozambique France 0 0 0
4 2000 Mozambique Germany 0 0 0
5 2000 Uganda France 2 1 0.5
6 2000 Uganda Germany 2 0 0
7 2001 Mozambique France 1 1 0.667
8 2001 Mozambique Germany 1 0 0
9 2001 Uganda France 3 1 0.5
10 2001 Uganda Germany 3 0 0
11 2002 Mozambique France 2 0 0.2
12 2002 Mozambique Germany 2 1 0.2
13 2002 Uganda France 0 0 0
14 2002 Uganda Germany 0 0 0.5
15 2003 Uganda France 2 0 0
16 2003 Uganda Germany 2 1 0.5