使用R中另一个数据帧的条件组和创建新列

使用R中另一个数据帧的条件组和创建新列,r,loops,sum,conditional-statements,grouping,R,Loops,Sum,Conditional Statements,Grouping,让我用一个例子来说明我的问题: 样本数据: df<-data.frame(BirthYear = c(1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005), Number= c(1,1,1,1,1,1,1,1,1,1,1), Group = c("g", "g", "g", "g", "g", "g&qu

让我用一个例子来说明我的问题:

样本数据:

df<-data.frame(BirthYear = c(1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005), Number= c(1,1,1,1,1,1,1,1,1,1,1), Group = c("g", "g", "g", "g", "g", "g","t","t","t","t","t"))

df 
 BirthYear Number  Group 
1  1995     1       g
2  1996     1       g 
3  1997     1       g
4  1998     1       g
5  1999     1       g
6  2000     1       g
7  2001     1       t
8  2002     1       t
9  2003     1       t
10 2004     1       t
11 2005     1       t 
我知道我可以在
df1
上进行for循环以创建新列,但我不知道如何指定条件以获得每年的正确组和。 我希望这个例子能清楚地说明我想要实现的目标。
我非常感谢任何帮助,因为我真的被困在这一点上。

如果您只想计算跨越
2015:2020
出生年份的年份差异,那么您不必创建单独的数据框。也许只是

library(tidyr)
library(dplyr)
df %>% 
  expand(Year = 2015:2020, nesting(BirthYear, Number, Group)) %>% 
  group_by(Year, Group) %>% 
  summarise(
    `1` = sum(between(Year - BirthYear, 19, 20) * Number), 
    `2` = sum((Year - BirthYear < 19) * Number)
  ) %>% 
  pivot_wider(names_from = "Group", values_from = c("1", "2"), names_glue = "{Group}{.value}")
library(tidyr)
图书馆(dplyr)
df%>%
扩展(年份=2015:2020,嵌套(出生年份、数量、组))%>%
分组单位(年度,分组)%>%
总结(
`1`=总和(介于(年份-出生年份,19,20)*数字之间),
`2`=总和((年份-出生年份<19)*数字)
) %>% 
pivot_wide(names_from=“Group”,values_from=c(“1”,“2”),names_glue=“{Group}{.value}”)
输出

`summarise()` regrouping output by 'Year' (override with `.groups` argument)
# A tibble: 6 x 5
# Groups:   Year [6]
   Year    g1    t1    g2    t2
  <int> <dbl> <dbl> <dbl> <dbl>
1  2015     2     0     4     5
2  2016     2     0     3     5
3  2017     2     0     2     5
4  2018     2     0     1     5
5  2019     2     0     0     5
6  2020     1     1     0     4
`summary()`按'Year'重新分组输出(用'.groups'参数覆盖)
#一个tibble:6x5
#组别:年份[6]
年份g1 t1 g2 t2
1  2015     2     0     4     5
2  2016     2     0     3     5
3  2017     2     0     2     5
4  2018     2     0     1     5
5  2019     2     0     0     5
6  2020     1     1     0     4

df$Year
是出生年份,而
df1$Year
是当前年份吗?如果列有不同的名称,可能会比较容易混淆。是的,这就是它们所代表的。。我会把问题中的名字改清楚
library(tidyr)
library(dplyr)
df %>% 
  expand(Year = 2015:2020, nesting(BirthYear, Number, Group)) %>% 
  group_by(Year, Group) %>% 
  summarise(
    `1` = sum(between(Year - BirthYear, 19, 20) * Number), 
    `2` = sum((Year - BirthYear < 19) * Number)
  ) %>% 
  pivot_wider(names_from = "Group", values_from = c("1", "2"), names_glue = "{Group}{.value}")
`summarise()` regrouping output by 'Year' (override with `.groups` argument)
# A tibble: 6 x 5
# Groups:   Year [6]
   Year    g1    t1    g2    t2
  <int> <dbl> <dbl> <dbl> <dbl>
1  2015     2     0     4     5
2  2016     2     0     3     5
3  2017     2     0     2     5
4  2018     2     0     1     5
5  2019     2     0     0     5
6  2020     1     1     0     4