如何在dplyr中找到分组变量的分位数
我在r中有以下数据帧如何在dplyr中找到分组变量的分位数,r,dplyr,R,Dplyr,我在r中有以下数据帧 No. Key Category 1 ABC123 0R1D 2 ABC567 0R1D 3 DEF444 1R1D 4 FRT433 1R1D 5 FRT433 1R1D 6 TYU412 2R2D 7
No. Key Category
1 ABC123 0R1D
2 ABC567 0R1D
3 DEF444 1R1D
4 FRT433 1R1D
5 FRT433 1R1D
6 TYU412 2R2D
7 BEC123 0R1D
8 BCY567 0R1D
9 DEO444 1R1D
10 FRJ433 1R1D
11 FRK433 1R1D
12 TYL412 2R2D
我想在所有类别和相同的4个分位数中找到唯一的键。
我在R做以下工作
truck_quartile <- df %>%
group_by(Category) %>%
summarise(No_of_trailers = n_distinct(key)) %>%
do(data.frame(t(quantile(.$No_of_trailers, probs = c(0.25, 0.50, 0.75, 1))))) %>%
as.data.frame()
truck\u四分位数%
组别(类别)%>%
总结(无拖车=n不同(关键))%>%
do(数据帧(t)(分位数(.$No_of_挂车,probs=c(0.25,0.50,0.75,1)))%>%
as.data.frame()
但它只给了我一行,因为我在分位数之前总结了它。如评论中所述,您将只得到一个分位数向量,如预期的“不同类别中唯一数量的车辆”。下面是要计算每个类别中每个唯一车辆出现次数的分位数的情况。
library(dplyr)
truck_quartile <- df %>%
group_by(Category, Key) %>%
summarize(No_of_trailers = n()) %>%
group_by(Category) %>%
do(data.frame(t(quantile(.$No_of_trailers, probs = c(0.25, 0.50, 0.75, 1))))) %>%
as.data.frame() %>%
setNames(c("Category", "25%", "50%", "75%", "100%"))
您原来的df
有点不幸,因为同一类别中只有一辆车有重复的。因此,我通过采样带替换的df
创建了df_long
结果为df_long
:
Category 25% 50% 75% 100%
1 0R1D 1 1 1 1
2 1R1D 1 1 1 2
3 2R2D 1 1 1 1
Category 25% 50% 75% 100%
1 0R1D 1 3 4 5
2 1R1D 3 4 6 11
3 2R2D 1 2 2 4
library(data.table)
df = fread("No. Key Category
1 ABC123 0R1D
2 ABC567 0R1D
3 DEF444 1R1D
4 FRT433 1R1D
5 FRT433 1R1D
6 TYU412 2R2D
7 BEC123 0R1D
8 BCY567 0R1D
9 DEO444 1R1D
10 FRJ433 1R1D
11 FRK433 1R1D
12 TYL412 2R2D")
set.seed(123)
df_long = data.frame(Key = sample(df$Key, 100, replace = TRUE),
Category = sample(df$Category, 100, replace = TRUE))
注意:只使用数字和特殊符号来命名变量可能不是一个好主意,但如果您只想要一个好看的表,而不实际使用列进行进一步计算,则也可以
数据:
Category 25% 50% 75% 100%
1 0R1D 1 1 1 1
2 1R1D 1 1 1 2
3 2R2D 1 1 1 1
Category 25% 50% 75% 100%
1 0R1D 1 3 4 5
2 1R1D 3 4 6 11
3 2R2D 1 2 2 4
library(data.table)
df = fread("No. Key Category
1 ABC123 0R1D
2 ABC567 0R1D
3 DEF444 1R1D
4 FRT433 1R1D
5 FRT433 1R1D
6 TYU412 2R2D
7 BEC123 0R1D
8 BCY567 0R1D
9 DEO444 1R1D
10 FRJ433 1R1D
11 FRK433 1R1D
12 TYL412 2R2D")
set.seed(123)
df_long = data.frame(Key = sample(df$Key, 100, replace = TRUE),
Category = sample(df$Category, 100, replace = TRUE))
预期的输出是什么预期的输出是按类别的所有4个分位数中没有唯一的键可能您需要
groupby
aftersummary
4个分位数的什么值?您能显示预期的输出吗?如果我做分位数(c(4,5,2),probs=c(0.25,0.50,0.75,1))#25%50%75%100%3.0 4.0 4.5.0