使用R中另一个数据帧的条件组和创建新列
让我用一个例子来说明我的问题: 样本数据:使用R中另一个数据帧的条件组和创建新列,r,loops,sum,conditional-statements,grouping,R,Loops,Sum,Conditional Statements,Grouping,让我用一个例子来说明我的问题: 样本数据: df<-data.frame(BirthYear = c(1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005), Number= c(1,1,1,1,1,1,1,1,1,1,1), Group = c("g", "g", "g", "g", "g", "g&qu
df<-data.frame(BirthYear = c(1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005), Number= c(1,1,1,1,1,1,1,1,1,1,1), Group = c("g", "g", "g", "g", "g", "g","t","t","t","t","t"))
df
BirthYear Number Group
1 1995 1 g
2 1996 1 g
3 1997 1 g
4 1998 1 g
5 1999 1 g
6 2000 1 g
7 2001 1 t
8 2002 1 t
9 2003 1 t
10 2004 1 t
11 2005 1 t
我知道我可以在df1
上进行for循环以创建新列,但我不知道如何指定条件以获得每年的正确组和。
我希望这个例子能清楚地说明我想要实现的目标。
我非常感谢任何帮助,因为我真的被困在这一点上。如果您只想计算跨越
2015:2020
和出生年份的年份差异,那么您不必创建单独的数据框。也许只是
library(tidyr)
library(dplyr)
df %>%
expand(Year = 2015:2020, nesting(BirthYear, Number, Group)) %>%
group_by(Year, Group) %>%
summarise(
`1` = sum(between(Year - BirthYear, 19, 20) * Number),
`2` = sum((Year - BirthYear < 19) * Number)
) %>%
pivot_wider(names_from = "Group", values_from = c("1", "2"), names_glue = "{Group}{.value}")
library(tidyr)
图书馆(dplyr)
df%>%
扩展(年份=2015:2020,嵌套(出生年份、数量、组))%>%
分组单位(年度,分组)%>%
总结(
`1`=总和(介于(年份-出生年份,19,20)*数字之间),
`2`=总和((年份-出生年份<19)*数字)
) %>%
pivot_wide(names_from=“Group”,values_from=c(“1”,“2”),names_glue=“{Group}{.value}”)
输出
`summarise()` regrouping output by 'Year' (override with `.groups` argument)
# A tibble: 6 x 5
# Groups: Year [6]
Year g1 t1 g2 t2
<int> <dbl> <dbl> <dbl> <dbl>
1 2015 2 0 4 5
2 2016 2 0 3 5
3 2017 2 0 2 5
4 2018 2 0 1 5
5 2019 2 0 0 5
6 2020 1 1 0 4
`summary()`按'Year'重新分组输出(用'.groups'参数覆盖)
#一个tibble:6x5
#组别:年份[6]
年份g1 t1 g2 t2
1 2015 2 0 4 5
2 2016 2 0 3 5
3 2017 2 0 2 5
4 2018 2 0 1 5
5 2019 2 0 0 5
6 2020 1 1 0 4
df$Year
是出生年份,而df1$Year
是当前年份吗?如果列有不同的名称,可能会比较容易混淆。是的,这就是它们所代表的。。我会把问题中的名字改清楚
library(tidyr)
library(dplyr)
df %>%
expand(Year = 2015:2020, nesting(BirthYear, Number, Group)) %>%
group_by(Year, Group) %>%
summarise(
`1` = sum(between(Year - BirthYear, 19, 20) * Number),
`2` = sum((Year - BirthYear < 19) * Number)
) %>%
pivot_wider(names_from = "Group", values_from = c("1", "2"), names_glue = "{Group}{.value}")
`summarise()` regrouping output by 'Year' (override with `.groups` argument)
# A tibble: 6 x 5
# Groups: Year [6]
Year g1 t1 g2 t2
<int> <dbl> <dbl> <dbl> <dbl>
1 2015 2 0 4 5
2 2016 2 0 3 5
3 2017 2 0 2 5
4 2018 2 0 1 5
5 2019 2 0 0 5
6 2020 1 1 0 4