R-根据列中的条件选择并为组分配值
具有如下所示的数据帧:R-根据列中的条件选择并为组分配值,r,R,具有如下所示的数据帧: d year pos days sal 1 2009 A 31 2000 2 2009 B 60 4000 3 2009 C 10 600 4 2010 B 10 1000 5 2010 D 90 7000 我想按年份分组数据,添加天数和sal,并选择组中天数最多的pos 结果应该是: year pos days sal 1 2009 B 101
d
year pos days sal
1 2009 A 31 2000
2 2009 B 60 4000
3 2009 C 10 600
4 2010 B 10 1000
5 2010 D 90 7000
我想按年份分组数据,添加天数和sal,并选择组中天数最多的pos
结果应该是:
year pos days sal
1 2009 B 101 6600
2 2010 D 100 8000
我可以使用诸如tapplyd$days、d$year、sum等函数来处理诸如days和sal之类的数值
但是,我不知道如何选择在天满足条件的pos并将其分配给组
任何意见将不胜感激 我们可以使用dplyr。按“年”分组后,获取“pos”,其中“days”为max,which.maxdays,以及“days”和“sal”的总和
library(dplyr)
d %>%
group_by(year) %>%
summarise(pos = pos[which.max(days)], days = sum(days), sal = sum(sal))
# # A tibble: 2 × 4
# year pos days sal
# <int> <chr> <int> <int>
#1 2009 B 101 6600
#2 2010 D 100 8000
具有基本R的解决方案:
m1 <- d[as.logical(with(d, ave(days, year, FUN = function(x) seq_along(x) == which.max(x)) )), c('year','pos')]
m2 <- aggregate(cbind(days, sal) ~ year, d, sum)
merge(m1, m2, by = 'year')
生成的data.frame/data.table:
year pos days sal
1 2009 B 101 6600
2 2010 D 100 8000
使用sqldf:
library(sqldf)
cbind.data.frame(sqldf('select year, sum(days) as days, sum(sal) as sal
from d group by year'),
sqldf('select pos from d group by year having days=max(days)'))
year days sal pos
1 2009 101 6600 B
2 2010 100 8000 D
使用dputnameofdataframe并将其发布在您的问题中。非常感谢!我一直在寻找这个:这也有效:sqldfselect year、pos、maxdays max_days、sumdays、d group的sumsal
library(sqldf)
cbind.data.frame(sqldf('select year, sum(days) as days, sum(sal) as sal
from d group by year'),
sqldf('select pos from d group by year having days=max(days)'))
year days sal pos
1 2009 101 6600 B
2 2010 100 8000 D