R 按组和年份将最大累计金额合并到新表中_R_Ggplot2_Dplyr

R 按组和年份将最大累计金额合并到新表中

R 按组和年份将最大累计金额合并到新表中,r,ggplot2,dplyr,R,Ggplot2,Dplyr,我想按组和年份生成累积总和，然后将每个组和年份的累积总和复制到一个新的数据框中，该数据框将用于使用ggplot生成堆叠面积图源数据帧：到目前为止，我已经使用我为每个新个体创建的1列中的cumsum生成了按年度和模型分组的累积总和： df$count <- 1 df <- df %>% group_by(model) %>% mutate(cs_model = cumsum(count)) # My source data are thus: # Groups:

我想按组和年份生成累积总和，然后将每个组和年份的累积总和复制到一个新的数据框中，该数据框将用于使用

ggplot

生成堆叠面积图

源数据帧：

到目前为止，我已经使用我为每个新个体创建的1列中的

cumsum

生成了按年度和模型分组的累积总和：

df$count <- 1
df <- df %>% group_by(model) %>% mutate(cs_model = cumsum(count))

# My source data are thus:

# Groups:   model [4]
    year model count cs_model
   <dbl> <dbl> <dbl>    <dbl>
 1  1876     1     1        1
 2  1885     3     1        1
 3  1937     2     1        1
 4  1939     1     1        2
 5  1950     1     1        3
 6  1960     1     1        4
 7  1969     1     1        5
 8  1971     3     1        2
 9  1973     2     1        2
10  1974     3     1        3

如您所见，截至1974年，

model=3

在

year=1974

的累积总和为3:1（1885）+1（1971）+1（1971）=3。

model=2

在

year=1973

的累积总和为1（1937）+1（1973）=2

这很好，但在某些年份，单个模型的多个个体在一年内被包括在内，因此每个

模型

和

年份

的累积总和应为最大值：

# Groups:   model [3]
    year model count cs_model
   <dbl> <dbl> <dbl>    <dbl>
 1  2003     1     1       51
 2  2003     1     1       52
 3  2003     2     1       17
 4  2003     3     1       12
 5  2004     1     1       53
 6  2004     1     1       54
 7  2004     3     1       13
 8  2006     1     1       55
 9  2006     1     1       56
10  2006     1     1       57
11  2006     1     1       58
12  2006     3     1       14
13  2006     3     1       15

我想在

years\u区域

中创建一个新列，在

df

中为每个车型和每年创建一个累计金额。我遇到麻烦是因为：

并非每年都为每种车型添加一个新的车型

有些年份在每个车型中包含多个新的个体，因此每年的累计总和必须是该年份和车型的最高唯一值

原始时间序列中存在许多空白

df

仅包括创建新个人的年份<代码>年份\地区包括1876-2019年的所有年份

为清晰起见进行编辑：

年\u地区的预期产出如下所示，为了提供更具说明性的示例，使用了随机数：
   year model cs_model
1   1876     1       1
2   1876     2       0
3   1876     3       3`

在1876年，有1个新的1
，0个新的2
，3个新的3

   year model cs_model
4   1877     1       2
5   1877     2       1
6   1877     3       6

其中在1877年，有1个新的1
（1+1=2），1个新的2
（0+1=1），和3个新的3
（3+3=6）
其中在1878年，有2个新的1
（2+2=4），0个新的2
（1+0=1），4个新的3
（6+4=10）
在1879年，有2个新的1
（4+2=6），2个新的2
（1+2=3），4个新的3
（10+4=14），等等。
你能用dput
来表示df
吗。尝试使用left_-join
即left_-join（years_area，df%>%select（year，model，cs_model））
我认为这里的问题是，有些年份对于每一年和每种车型都有多个（不同）值cs_model
。此外，在years\u区域
但不在df
的年份将不适用-我希望他们重复上一个值cs\u模型
，直到增加一个新的个体，并且该值增加。数据的不同结构让我很不舒服…你可以使用left_join（years_area，df%%>%select（year，model，cs_model））%%>%fill
也可以添加一个group byleft_join（years_area，df%%select（year，model，cs_model））%%>%group_by（year）%%>%fill（cs_model）
如果有重复，上面的代码会这样做，然后，这些重复必须来自两个数据集中的重复值。您可能需要在distinctdatta上加入
   year model cs_model
1   1876     1       1
2   1876     2       0
3   1876     3       3`

   year model cs_model
4   1877     1       2
5   1877     2       1
6   1877     3       6

   year model cs_model
7   1878     1       4
8   1878     2       1
9   1878     3       10

   year model cs_model
10  1879     1       6
11  1879     2       3
12  1879     3       14