如何在R中的组内排名？_R_Group By_Rank

如何在R中的组内排名？

如何在R中的组内排名？,r,group-by,rank,R,Group By,Rank,好的，看看这个数据框 customer_name order_dates order_values 1 John 2010-11-01 15 2 Bob 2008-03-25 12 3 Alex 2009-11-15 5 4 John 2012-08-06 15 5 John 2015-05-07

好的，看看这个数据框

  customer_name order_dates order_values
1          John  2010-11-01           15
2           Bob  2008-03-25           12
3          Alex  2009-11-15            5
4          John  2012-08-06           15
5          John  2015-05-07           20

假设我想添加一个order变量，该变量根据名称、最大订单日期，使用平局断路器处的最后一个订单日期，对最高订单值进行排序。因此，最终数据应如下所示：

  customer_name order_dates order_values ranked_order_values_by_max_value_date
1          John  2010-11-01           15                               3
2           Bob  2008-03-25           12                               1
3          Alex  2009-11-15            5                               1
4          John  2012-08-06           15                               2
5          John  2015-05-07           20                               1

其中，每个人的单个订单获得1，所有后续订单都基于该值进行排序，平局打破者是获得优先级的最后一个订单日期。在本例中，John的2012年8月6日订单获得#2排名，因为它是在2010年11月1日之后发布的。2015年5月7日的订单是1，因为它是最大的订单。因此，即使这个订单是20年前下的，它也应该是#1级，因为这是John的最高订单价值

有人知道我在R怎么做吗？我可以在数据帧中的一组指定变量中进行排序

谢谢你的帮助

使用

dplyr

library(dplyr)
df %>%
    group_by(customer_name) %>%
    mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))

Source: local data frame [5 x 4]
Groups: customer_name

  customer_name order_dates order_values my_ranks
1          John  2010-11-01           15        3
2           Bob  2008-03-25           12        1
3          Alex  2009-11-15            5        1
4          John  2012-08-06           15        2
5          John  2015-05-07           20        1

这可以通过

ave

和

rank

实现

ave

将适当的组传递给

rank

。由于请求的顺序，

rank

的结果被颠倒：

with(x, ave(as.numeric(order_dates), customer_name, FUN=function(x) rev(rank(x))))
## [1] 3 1 1 2 1

在base

中，您可以使用稍微笨拙的

transform(df,rank=ave(1:nrow(df),customer_name,
  FUN=function(x) order(order_values[x],order_dates[x],decreasing=TRUE)))

客户名称订单日期订单价值排名 1约翰2010-11-01 15 3 2鲍勃2008-03-25 12 1 3亚历克斯2009-11-15 5 1 4约翰2012-08-06 15 2 5约翰2015-05-07 20 1 其中，

顺序

提供了每组的主断路器值和联络断路器值。

最高评级答案（由cdeterman提供）实际上是不正确的。order函数提供排名第一、第二、第三等的值的位置，而不是当前顺序的值的排名

让我们举一个简单的例子，我们想排名，从最大的开始，按客户名称分组。我已经包括了一个手动排名，以便我们可以检查值

    > df
       customer_name order_values manual_rank
    1           John            2           5
    2           John            5           2
    3           John            9           1
    4           John            1           6
    5           John            4           3
    6           John            3           4
    7           Lucy            4           4
    8           Lucy            9           1
    9           Lucy            6           3
    10          Lucy            2           6
    11          Lucy            8           2
    12          Lucy            3           5

如果我运行cdeterman建议的代码，我会得到以下不正确的等级：

    > df %>%
    +   group_by(customer_name) %>%
    +   mutate(my_ranks = order(order_values, decreasing=TRUE))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank my_ranks
              <fctr>        <dbl>       <dbl>    <int>
    1           John            2           5        3
    2           John            5           2        2
    3           John            9           1        5
    4           John            1           6        6
    5           John            4           3        1
    6           John            3           4        4
    7           Lucy            4           4        2
    8           Lucy            9           1        5
    9           Lucy            6           3        3
    10          Lucy            2           6        1
    11          Lucy            8           2        6
    12          Lucy            3           5        4

>df%>%
+分组依据（客户名称）%>%
+变异（我的秩=顺序（顺序值，递减=真））
来源：本地数据帧[12 x 4]
分组：客户名称[2]
客户名称订单价值手册排名我的排名
1约翰2 5 3
约翰5 2
3约翰911 5
约翰1 6
约翰4 3 1
约翰3 4
7露西4 2
8.9.15
9露西633
10露西2 6 1
11露西8 2 6
12露西3 5 4

Order用于将数据帧重新排序为降序或升序。我们实际上想要的是运行order函数两次，二阶函数给出我们想要的实际秩

    > df %>%
    +   group_by(customer_name) %>%
    +   mutate(good_ranks = order(order(order_values, decreasing=TRUE)))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank good_ranks
              <fctr>        <dbl>       <dbl>      <int>
    1           John            2           5          5
    2           John            5           2          2
    3           John            9           1          1
    4           John            1           6          6
    5           John            4           3          3
    6           John            3           4          4
    7           Lucy            4           4          4
    8           Lucy            9           1          1
    9           Lucy            6           3          3
    10          Lucy            2           6          6
    11          Lucy            8           2          2
    12          Lucy            3           5          5

>df%>%
+分组依据（客户名称）%>%
+变异（好的排列=顺序（顺序（顺序值，递减=真）））
来源：本地数据帧[12 x 4]
分组：客户名称[2]
客户名称订单价值手册等级良好等级
1约翰2 5 5
约翰5 2
3约翰911
约翰1 6
约翰4 3 3
约翰3 4
7露西4
8.9.11
9露西633
10露西2 6 6
11露西822
12露西3 5 5

@akrun关于值的连接断路器呢？下面是制作数据帧的代码，以防有帮助：customer_name@SenorO OP的示例测试起来应该更复杂一些。另外，

dplyr

中的

densite\u-rank

是tie的一种方式breaker@akun：值的截止日期为订单日期。所以John有两个15美元的订单，但排在第一位的订单排名更高。可能是

setDT（df1）[，rnk:=订单（desc（订单值），desc（订单日期）），客户名称]

使用

数据。表这是不正确的。这是由@T.Himmel提供的。这对我来说非常有用。我必须先运行detach（“package:plyr，unload=TRUE），这样它才能正确分组。谢谢你的解决方案！
    > df %>%
    +   group_by(customer_name) %>%
    +   mutate(good_ranks = order(order(order_values, decreasing=TRUE)))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank good_ranks
              <fctr>        <dbl>       <dbl>      <int>
    1           John            2           5          5
    2           John            5           2          2
    3           John            9           1          1
    4           John            1           6          6
    5           John            4           3          3
    6           John            3           4          4
    7           Lucy            4           4          4
    8           Lucy            9           1          1
    9           Lucy            6           3          3
    10          Lucy            2           6          6
    11          Lucy            8           2          2
    12          Lucy            3           5          5

df %>% 
  group_by(customer_name) %>% 
  arrange(customer_name,desc(order_values)) %>% 
  mutate(rank2=rank(order_values))