如何计算r中两年的移动平均数_R_Average_Moving Average_Rolling Computation

如何计算r中两年的移动平均数

如何计算r中两年的移动平均数,r,average,moving-average,rolling-computation,R,Average,Moving Average,Rolling Computation,我有一个关于并购（M&a）的大数据框架（900k行） df有四列：日期（并购完成时）、目标国（哪个国家的公司被合并/收购）、收购国（哪个国家的公司是收购者）和大公司（无论收购人是否为大公司，其中TRUE表示该公司为大公司）以下是我的df示例： > df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L, 2002L, 2002L), target_nation = c("Uganda&

我有一个关于并购（M&a）的大数据框架（900k行）

df有四列：日期（并购完成时）、目标国（哪个国家的公司被合并/收购）、收购国（哪个国家的公司是收购者）和大公司（无论收购人是否为大公司，其中TRUE表示该公司为大公司）

以下是我的df示例：

> df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L, 
2002L, 2002L), target_nation = c("Uganda", "Uganda", "Uganda", 
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda"), acquiror_nation = c("France", 
"Germany", "France", "France", "Germany", "France", "France", 
"Germany"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE, FALSE, 
TRUE, TRUE, TRUE)), row.names = c(NA, -8L))

> df 

   date target_nation acquiror_nation big_corp_TF
1: 2000        Uganda          France        TRUE
2: 2000        Uganda         Germany       FALSE
3: 2001        Uganda          France        TRUE
4: 2001        Uganda          France       FALSE
5: 2001        Uganda         Germany       FALSE
6: 2002        Uganda          France        TRUE
7: 2002        Uganda          France        TRUE
8: 2002        Uganda         Germany        TRUE

请注意，2000年的份额将保持不变，因为没有上一年使其成为两年平均值；2001年将变为0.4（因为（1+1）/（2+3）=0.4）；2002年将变为0.5（因为（1+2）/（3+3）=0.5）

你对如何编写一个计算两年平均份额的代码有什么想法吗？我想我需要在这里使用for循环，但我不知道如何编写。如果有任何建议，我们将不胜感激

编辑：AnilGoyal的代码与示例数据完美结合，但我的实际数据显然更混乱，因此我想知道是否有解决我遇到的问题的方法

我的实际数据集有时跳过一年，有时不包括前几行中包含的采集国。请查看我的实际数据的更准确样本：

> df_new <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2002L, 
2002L, 2002L, 2003L, 2003L, 2004L, 2004L, 2004L, 2006L, 2006L
), target_nation = c("Uganda", "Uganda", "Uganda", "Uganda", 
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Uganda", 
"Uganda", "Uganda", "Uganda", "Uganda"), acquiror_nation = c("France", 
"Germany", "France", "France", "Germany", "France", "France", 
"Germany", "Germany", "Germany", "France", "France", "Germany", 
"France", "France"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE, FALSE, 
TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE)), row.names = c(NA, 
-15L))

> df_new 

    date target_nation acquiror_nation big_corp_TF
 1: 2000        Uganda          France     TRUE
 2: 2000        Uganda         Germany    FALSE
 3: 2001        Uganda          France     TRUE
 4: 2001        Uganda          France    FALSE
 5: 2001        Uganda         Germany    FALSE
 6: 2002        Uganda          France     TRUE
 7: 2002        Uganda          France     TRUE
 8: 2002        Uganda         Germany     TRUE
 9: 2003        Uganda         Germany     TRUE
10: 2003        Uganda         Germany    FALSE
11: 2004        Uganda          France     TRUE
12: 2004        Uganda          France    FALSE
13: 2004        Uganda         Germany     TRUE
14: 2006        Uganda          France     TRUE
15: 2006        Uganda          France     TRUE

>df_新建df_新建
日期目标国家收购人国家大公司
1:2000乌干达-法国-真实
2:2000乌干达-德国假
3:2001乌干达-法国-真实
4:2001乌干达-法国假
5:2001乌干达-德国假
6:2002乌干达-法国-真实
7:2002乌干达-法国-真实
8:2002乌干达德国真实
9:2003乌干达-德国-真实
10:2003乌干达-德国假
11:2004乌干达-法国-真实
12:2004乌干达-法国假
13:2004乌干达德国真实
14:2006乌干达-法国-真实
15:2006乌干达-法国-真实

注：2003年法国没有争吵；2005年也没有

如果我运行Anil的第一个代码，结果如下：

   date target_nation acquiror_nation    n1    n2 share
  <int> <chr>         <chr>           <dbl> <int> <dbl>
1  2000 Uganda        France              2     1   0.5
2  2001 Uganda        France              3     1   0.4
3  2002 Uganda        France              3     2   0.5
4  2004 Uganda        France              3     1   0.5
5  2006 Uganda        France              2     2   0.6

       date target_nation acquiror_nation    n1    n2 share
      <int> <chr>         <chr>           <dbl> <int> <dbl>
    1  2000 Uganda        France              2     1   0.5
    2  2001 Uganda        France              3     1   0.4
    3  2002 Uganda        France              3     2   0.5
    4  2003 Uganda        France              2     0   0.4
    5  2004 Uganda        France              3     1   0.2
    6  2005 Uganda        France              0     0   0.33
    7  2006 Uganda        France              2     2   1.0

    date target_nation acquiror_nation    n1    n2 share
   <dbl> <chr>         <chr>           <dbl> <int> <dbl>
 1  1999 Mozambique    France              1     0 0    
 2  1999 Mozambique    Germany             1     0 0    
 3  1999 Uganda        France              0     0 0    
 4  1999 Uganda        Germany             0     0 0    
 5  2000 Mozambique    France              0     0 0    
 6  2000 Mozambique    Germany             0     0 0    
 7  2000 Uganda        France              2     1 0.25 
 8  2000 Uganda        Germany             2     0 0.167
 9  2001 Mozambique    France              1     1 0.4  
10  2001 Mozambique    Germany             1     0 0.333
11  2001 Uganda        France              3     1 0.333
12  2001 Uganda        Germany             3     0 0.25 
13  2002 Mozambique    France              2     0 0.2  
14  2002 Mozambique    Germany             2     1 0.25 
15  2002 Uganda        France              0     0 0.25 
16  2002 Uganda        Germany             0     0 0.25 
17  2003 Mozambique    France              0     0 0.25 
18  2003 Mozambique    Germany             0     0 0.25 
19  2003 Uganda        France              2     0 0.167
20  2003 Uganda        Germany             2     1 0.25

日期目标国家收购人国家n1 n2份额
1 2000乌干达法国2 1 0.5
2 2001乌干达法国3 1 0.4
3 2002乌干达法国3 2 0.5
4 2004乌干达法国3 1 0.5
5 2006乌干达-法国2 0.6

注：法国2003年和2005年没有结果；我希望有2003年和2005年的结果（因为我们计算的是两年平均数，因此我们应该能够得到2003年和2005年的结果）。此外，2006年的份额实际上是不正确的，因为它应该是1（它应该取2005年的值（0）而不是用2004年的数值来计算平均值）

我希望能够收到以下tibble：

   date target_nation acquiror_nation    n1    n2 share
  <int> <chr>         <chr>           <dbl> <int> <dbl>
1  2000 Uganda        France              2     1   0.5
2  2001 Uganda        France              3     1   0.4
3  2002 Uganda        France              3     2   0.5
4  2004 Uganda        France              3     1   0.5
5  2006 Uganda        France              2     2   0.6

       date target_nation acquiror_nation    n1    n2 share
      <int> <chr>         <chr>           <dbl> <int> <dbl>
    1  2000 Uganda        France              2     1   0.5
    2  2001 Uganda        France              3     1   0.4
    3  2002 Uganda        France              3     2   0.5
    4  2003 Uganda        France              2     0   0.4
    5  2004 Uganda        France              3     1   0.2
    6  2005 Uganda        France              0     0   0.33
    7  2006 Uganda        France              2     2   1.0

    date target_nation acquiror_nation    n1    n2 share
   <dbl> <chr>         <chr>           <dbl> <int> <dbl>
 1  1999 Mozambique    France              1     0 0    
 2  1999 Mozambique    Germany             1     0 0    
 3  1999 Uganda        France              0     0 0    
 4  1999 Uganda        Germany             0     0 0    
 5  2000 Mozambique    France              0     0 0    
 6  2000 Mozambique    Germany             0     0 0    
 7  2000 Uganda        France              2     1 0.25 
 8  2000 Uganda        Germany             2     0 0.167
 9  2001 Mozambique    France              1     1 0.4  
10  2001 Mozambique    Germany             1     0 0.333
11  2001 Uganda        France              3     1 0.333
12  2001 Uganda        Germany             3     0 0.25 
13  2002 Mozambique    France              2     0 0.2  
14  2002 Mozambique    Germany             2     1 0.25 
15  2002 Uganda        France              0     0 0.25 
16  2002 Uganda        Germany             0     0 0.25 
17  2003 Mozambique    France              0     0 0.25 
18  2003 Mozambique    Germany             0     0 0.25 
19  2003 Uganda        France              2     0 0.167
20  2003 Uganda        Germany             2     1 0.25

日期目标国家收购人国家n1 n2份额
1 2000乌干达法国2 1 0.5
2 2001乌干达法国3 1 0.4
3 2002乌干达法国3 2 0.5
4 2003乌干达法国2 0 0.4
5 2004乌干达-法国3 1 0.2
6 2005乌干达-法国0.33
7 2006乌干达-法国2 1.0

注：请注意，2006年的结果也不同（因为我们现在以2005年而不是2004年为两年平均值）

你认为有可能找到一种方法来输出所需的TIB吗？我知道这是原始数据的一个问题：它只是缺少某些数据点。但是，将它们包含到原始数据集中似乎非常不方便；可能最好在中途包含它们，例如，在计算n1和n2之后。但是，什么是原始数据最方便的方法是什么

EDIT2:Anil的新代码可以很好地处理上述数据样本，但在处理更复杂的数据样本（包括多个目标国家）时遇到了不希望出现的问题。下面是一个更短但更复杂的数据样本：

> df_new_complex <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2003L, 
2003L, 1999L, 2001L, 2002L, 2002L), target_nation = c("Uganda", 
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Mozambique", 
"Mozambique", "Mozambique", "Mozambique"), acquiror_nation = c("France", 
"Germany", "France", "France", "Germany", "Germany", "Germany", 
"Germany", "France", "France", "Germany"), big_corp_TF = c(TRUE, 
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE
)), row.names = c(NA, -11L))

> df_new_complex 

date target_nation acquiror_nation big_corp_TF
 1: 2000        Uganda          France        TRUE
 2: 2000        Uganda         Germany       FALSE
 3: 2001        Uganda          France        TRUE
 4: 2001        Uganda          France       FALSE
 5: 2001        Uganda         Germany       FALSE
 6: 2003        Uganda         Germany        TRUE
 7: 2003        Uganda         Germany       FALSE
 8: 1999    Mozambique         Germany       FALSE
 9: 2001    Mozambique          France        TRUE
10: 2002    Mozambique          France       FALSE
11: 2002    Mozambique         Germany        TRUE

>df_new_complex df_new_complex
日期目标国家收购人国家大公司
1:2000乌干达-法国-真实
2:2000乌干达-德国假
3:2001乌干达-法国-真实
4:2001乌干达-法国假
5:2001乌干达-德国假
6:2003乌干达-德国-真实
7:2003乌干达-德国假
8:1999莫桑比克德国假
9:2001莫桑比克法国真实
10:2002莫桑比克法国假
11:2002莫桑比克德国真实

如您所见，此数据示例包括两个目标国家。Anil的代码，其中

param%
突变（d=1）%>%。。。
#我对另一个目标国家也是如此
更正2%
过滤器（目标国家==“乌干达”）%>%
突变（d=1）%>%。。。
#然后我使用rbind
正对
目标国家收购人国家n1 n2股份的日期
1 1999莫桑比克法国1 0 0
2 1999莫桑比克德国1 0 0
3 2000莫桑比克法国0 0
4 2000莫桑比克德国0 0
5 2001莫桑比克法国1
6 2001莫桑比克德国1 0 0
7 2002莫桑比克法国2 0 0.33
8 2002莫桑比克德国2 1 0.333
9 2000乌干达-法国2 1 0.5
correct1 <- df_new_complex %>% 
  filter(target_nation == "Mozambique") %>%
  mutate(d = 1) %>% ...

#I do the same for another target_nation

correct2 <- df_new_complex %>% 
  filter(target_nation == "Uganda") %>%
  mutate(d = 1) %>% ...

#I then use rbind

correct <- rbind(correct1, correct2)

#which produces the desired tibble (without a year 2003 for Mozambique and 1999 for Uganda).

> correct 

date target_nation acquiror_nation    n1    n2 share
   <dbl> <chr>         <chr>           <dbl> <int> <dbl>
 1  1999 Mozambique    France              1     0 0    
 2  1999 Mozambique    Germany             1     0 0    
 3  2000 Mozambique    France              0     0 0    
 4  2000 Mozambique    Germany             0     0 0    
 5  2001 Mozambique    France              1     1 1    
 6  2001 Mozambique    Germany             1     0 0 
 7  2002 Mozambique    France              2     0 0.33 
 8  2002 Mozambique    Germany             2     1 0.333
 9  2000 Uganda        France              2     1 0.5  
10  2000 Uganda        Germany             2     0 0.25 
11  2001 Uganda        France              3     1 0.286
12  2001 Uganda        Germany             3     0 0.2  
13  2002 Uganda        France              0     0 0.167
14  2002 Uganda        Germany             0     0 0.167
15  2003 Uganda        France              2     0 0    
16  2003 Uganda        Germany             2     1 0.25 

param <- 'France'
df_new %>% 
  mutate(d = 1) %>%
  complete(date = seq(min(date), max(date), 1), nesting(target_nation, acquiror_nation),
           fill = list(d =0, big_corp_TF = FALSE)) %>%
  group_by(date, target_nation) %>%
  mutate(n1 = sum(d)) %>%
  group_by(date, target_nation, acquiror_nation) %>%
  summarise(n1 = mean(n1),
            n2 = sum(big_corp_TF), .groups = 'drop') %>%
  filter(acquiror_nation == param) %>%
  mutate(share = sum_run(n2, k=2, idx = date)/sum_run(n1, k=2, idx = date))

# A tibble: 7 x 6
   date target_nation acquiror_nation    n1    n2 share
  <dbl> <chr>         <chr>           <dbl> <int> <dbl>
1  2000 Uganda        France              2     1 0.5  
2  2001 Uganda        France              3     1 0.4  
3  2002 Uganda        France              3     2 0.5  
4  2003 Uganda        France              2     0 0.4  
5  2004 Uganda        France              3     1 0.2  
6  2005 Uganda        France              0     0 0.333
7  2006 Uganda        France              2     2 1

df_new %>% 
  mutate(d = 1) %>%
  complete(date = seq(min(date), max(date), 1), nesting(target_nation, acquiror_nation),
           fill = list(d =0, big_corp_TF = FALSE)) %>%
  group_by(date, target_nation) %>%
  mutate(n1 = sum(d)) %>%
  group_by(date, target_nation, acquiror_nation) %>%
  summarise(n1 = mean(n1),
            n2 = sum(big_corp_TF), .groups = 'drop') %>%
  group_by(acquiror_nation) %>%
  mutate(share = sum_run(n2, k=2, idx = date)/sum_run(n1, k=2, idx = date))

# A tibble: 14 x 6
# Groups:   acquiror_nation [2]
    date target_nation acquiror_nation    n1    n2 share
   <dbl> <chr>         <chr>           <dbl> <int> <dbl>
 1  2000 Uganda        France              2     1 0.5  
 2  2000 Uganda        Germany             2     0 0    
 3  2001 Uganda        France              3     1 0.4  
 4  2001 Uganda        Germany             3     0 0    
 5  2002 Uganda        France              3     2 0.5  
 6  2002 Uganda        Germany             3     1 0.167
 7  2003 Uganda        France              2     0 0.4  
 8  2003 Uganda        Germany             2     1 0.4  
 9  2004 Uganda        France              3     1 0.2  
10  2004 Uganda        Germany             3     1 0.4  
11  2005 Uganda        France              0     0 0.333
12  2005 Uganda        Germany             0     0 0.333
13  2006 Uganda        France              2     2 1    
14  2006 Uganda        Germany             2     0 0

df_new_complex %>%
  mutate(d = 1) %>%
  group_by(target_nation) %>%
  complete(date = seq(min(date), max(date), 1), nesting(acquiror_nation),
           fill = list(d =0, big_corp_TF = FALSE)) %>%
  group_by(date, target_nation) %>%
  mutate(n1 = sum(d)) %>%
  group_by(date, target_nation, acquiror_nation) %>%
  summarise(n1 = mean(n1),
            n2 = sum(big_corp_TF), .groups = 'drop') %>%
  group_by(acquiror_nation) %>%
  mutate(share = sum_run(n2, k=2)/sum_run(n1, k=2))

# A tibble: 16 x 6
# Groups:   acquiror_nation [2]
    date target_nation acquiror_nation    n1    n2 share
   <dbl> <chr>         <chr>           <dbl> <int> <dbl>
 1  1999 Mozambique    France              1     0 0    
 2  1999 Mozambique    Germany             1     0 0    
 3  2000 Mozambique    France              0     0 0    
 4  2000 Mozambique    Germany             0     0 0    
 5  2000 Uganda        France              2     1 0.5  
 6  2000 Uganda        Germany             2     0 0    
 7  2001 Mozambique    France              1     1 0.667
 8  2001 Mozambique    Germany             1     0 0    
 9  2001 Uganda        France              3     1 0.5  
10  2001 Uganda        Germany             3     0 0    
11  2002 Mozambique    France              2     0 0.2  
12  2002 Mozambique    Germany             2     1 0.2  
13  2002 Uganda        France              0     0 0    
14  2002 Uganda        Germany             0     0 0.5  
15  2003 Uganda        France              2     0 0    
16  2003 Uganda        Germany             2     1 0.5