R 筛选分组数据

R 筛选分组数据,r,dplyr,R,Dplyr,我有一个data.frame,看起来像这样 DATE MEAN SUM MAX MIN SAISON JAHR 1 1995-09-01 00:00:00 2.370833 56.9 7.4 0 S 1995 2 1995-09-01 01:00:00 2.225000 53.4 7.4 0 S 1995 3 1995-09-01 02:00:00 2.091667 50.2 7.4 0 S 1995 4 19

我有一个data.frame,看起来像这样

 DATE                  MEAN    SUM  MAX MIN SAISON JAHR
1 1995-09-01 00:00:00 2.370833 56.9 7.4   0      S 1995
2 1995-09-01 01:00:00 2.225000 53.4 7.4   0      S 1995
3 1995-09-01 02:00:00 2.091667 50.2 7.4   0      S 1995
4 1995-09-01 03:00:00 1.929167 46.3 7.4   0      S 1995
5 1995-09-01 04:00:00 1.745833 41.9 7.4   0      S 1995
6 1995-09-01 05:00:00 1.558333 37.4 7.4   0      S 1995
....
通过dplyr软件包,我能够为每个SAISON和JAHR提取最高金额:

group_by(.data = dataframe,JAHR,SAISON)
summarise(gJahrSAISON_24, hoechsterNiederschlag = max(SUM))

你知道如何为每个JAHR和SAISON提取十个(!)最高的总和吗?

你可以使用
切片
排列

library(dplyr)
df1 %>%
  group_by(JAHR, SAISON) %>%
  arrange(desc(SUM)) %>%
  slice(1:10)
或使用
min\u-rank/densite\u-rank

df1 %>% 
    group_by(JAHR, SAISON) %>%
    filter(dense_rank(SUM)<=10)

或使用
base R

df1[with(df1, ave(SUM, SAISON, JAHR, FUN=function(x)
                    rank(-x, ties.method='first'))<=10),]
df1[with(df1,ave(SUM,SAISON,JAHR,FUN=function(x)
秩(-x,ties.method='first'))
 setDT(df1)[, .SD[frank(SUM, ties.method='first') <=10], by = .(JAHR, SAISON)]
library(sqldf)
sqldf('select * from df1 i
        where rowid in
          (select rowid from df1 
              where JAHR = i.JAHR and SAISON=i.SAISON
              order by SUM desc
              limit 10)
 order by i.JAHR, i.SAISON, i.SUM desc')
df1[with(df1, ave(SUM, SAISON, JAHR, FUN=function(x)
                    rank(-x, ties.method='first'))<=10),]