R 如何使用基于从条件参数计算的特定分位数的值创建列?

R 如何使用基于从条件参数计算的特定分位数的值创建列?,r,dataframe,quantile,R,Dataframe,Quantile,一切都在标题里。为了举例说明,我构建了以下示例 我有以下数据框: date <- c("01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011", "01.02.2011","01.02.2011","01.02.2011","01.02.2011", "02.02.2011","02.02.2011","02.02.2011","02.02.2011",

一切都在标题里。为了举例说明,我构建了以下示例

我有以下数据框:

date <- c("01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011",
          "01.02.2011","01.02.2011","01.02.2011","01.02.2011",
          "02.02.2011","02.02.2011","02.02.2011","02.02.2011","02.02.2011","02.02.2011",
          "02.02.2011","02.02.2011","02.02.2011","02.02.2011")
date <- as.Date(date, format="%d.%m.%Y")
ID <- c("A","B","C","D","E","F","G","H","I","J",
        "A","B","C","D","E","F","G","H","I","J")
values <- as.numeric(c("1","8","2","3","5","13","2","4","1","16",
            "4","2","12","16","8","1","7","11","2","10"))

df <- data.frame(ID, date, values)
我想创建一个新的列“QF”,它采用以下值:

  • 1如果按日期计算第40百分位和第70百分位
我想获得:

ID       date      values   QF
1   A 2011-02-01      1     1
2   B 2011-02-01      8     3
3   C 2011-02-01      2     1
4   D 2011-02-01      3     2
5   E 2011-02-01      5     2
6   F 2011-02-01     13     3
7   G 2011-02-01      2     1
8   H 2011-02-01      4     2
9   I 2011-02-01      1     1
10  J 2011-02-01     16     3
11  A 2011-02-02      4     1
12  B 2011-02-02      2     1
13  C 2011-02-02     12     3
14  D 2011-02-02     16     3
15  E 2011-02-02      8     2
16  F 2011-02-02      1     1
17  G 2011-02-02      7     2
18  H 2011-02-02     11     3
19  I 2011-02-02      2     1
20  J 2011-02-02     10     2

如果需要编辑我的问题,请毫不犹豫地告诉我一个
dplyr
选项可以是:

df %>%
 group_by(date) %>%
 mutate(QF = cut(values, c(0, quantile(values, probs = c(0.4, 0.7, 1))),
                 labels = 1:3))

   ID    date       values QF   
   <fct> <date>      <dbl> <fct>
 1 A     2011-02-01      1 1    
 2 B     2011-02-01      8 3    
 3 C     2011-02-01      2 1    
 4 D     2011-02-01      3 2    
 5 E     2011-02-01      5 2    
 6 F     2011-02-01     13 3    
 7 G     2011-02-01      2 1    
 8 H     2011-02-01      4 2    
 9 I     2011-02-01      1 1    
10 J     2011-02-01     16 3    
11 A     2011-02-02      4 1    
12 B     2011-02-02      2 1    
13 C     2011-02-02     12 3    
14 D     2011-02-02     16 3    
15 E     2011-02-02      8 2    
16 F     2011-02-02      1 1    
17 G     2011-02-02      7 2    
18 H     2011-02-02     11 3    
19 I     2011-02-02      2 1    
20 J     2011-02-02     10 2   
df%>%
分组单位(日期)%>%
变异(QF=cut(值,c(0,分位数)(值,probs=c(0.4,0.7,1)),
标签=1:3)
ID日期值QF
1A 2011-02-01 11
2b 2011-02-01 8 3
3 C 2011-02-01 2 1
4 D 2011-02-01 3 2
5E 2011-02-01 5 2
6 F 2011-02-01 13 3
7G 2011-02-01 2 1
8小时2011-02-01 4 2
9 I 2011-02-01 11
10 J 2011-02-01 16 3
11A 2011-02-02 4 1
12B 2011-02-02 2 1
13 C 2011-02-02 12 3
14 D 2011-02-02 16 3
15 E 2011-02-02 8 2
16楼2011-02-02 1
17G 2011-02-02 7 2
18小时2011-02-02 11 3
19 I 2011-02-02 2 1
20 J 2011-02-02 10 2

我们可以使用
fndInterval

library(dplyr)
df %>%
   group_by(date) %>%
    mutate(QF = findInterval(values, c(0, quantile(values, probs = c(0.4, 0.7, 1)))))

您的示例包含错误<代码>值应为数字。请查看函数
分位数
<代码>分位数(值,probs=c(.4,7,1))非常感谢您快速高效的回答:)
library(dplyr)
df %>%
   group_by(date) %>%
    mutate(QF = findInterval(values, c(0, quantile(values, probs = c(0.4, 0.7, 1)))))