使用Dplyr计算行百分比_R_Dplyr_Tidyverse

使用Dplyr计算行百分比

使用Dplyr计算行百分比,r,dplyr,tidyverse,R,Dplyr,Tidyverse,好的，我有这样一个数据帧： ID Month Year DOW Value 1 Jan 2019 Fri 20 1 Jan 2019 Sat 39 1 Feb 2019 Fri 30 1 Feb 2019 Sat 24 2 Jan 2019 Fri 20 2 Jan 2019 Sat 12 2 Feb 2019 Fri 1 2 Feb 2019 Sat 3 我的目标是确定每行相对于月份和年份的百分比手动计算此值，答案应为：

好的，我有这样一个数据帧：

ID Month Year DOW  Value
1  Jan   2019 Fri  20
1  Jan   2019 Sat  39
1  Feb   2019 Fri  30
1  Feb   2019 Sat  24
2  Jan   2019 Fri  20
2  Jan   2019 Sat  12
2  Feb   2019 Fri   1
2  Feb   2019 Sat   3

我的目标是确定每行相对于月份和年份的百分比

手动计算此值，答案应为：

ID Month Year DOW  Value   Percent
1  Jan   2019 Fri  20      .338
1  Jan   2019 Sat  39      .661
1  Feb   2019 Fri  30      .554
1  Feb   2019 Sat  24      .444
2  Jan   2019 Fri  20      .625
2  Jan   2019 Sat  12      .375
2  Feb   2019 Fri   1      .25
2  Feb   2019 Sat   3      .75

请注意，月-年组合始终添加到1

最后，我想取刚计算出的ID 1和ID 2的平均百分比

Month Year DOW  Avg
Jan   2019 Fri  0.482
Jan   2019 Sat  0.518
Feb   2019 Fri  0.402
Feb   2019 Sat  0.597

目标是使用dplyr执行此操作。此操作是否有效：

your_data = your_data %>%
  group_by(ID, Month, Year) %>%
  mutate(Percent = Value / sum(Value))

your_data %>% 
  filter(ID %in% c(1, 2)) %>%
  group_by(Month, Year, DOW) %>%
  summarize(Avg = mean(Percent)

>library(dplyr)
> df %>% group_by(ID, Month, Year) %>% mutate(Percent = Value/sum(Value)) %>% 
group_by(Month, Year,DOW) %>% summarise(Avg = mean(Percent)) %>% as.data.frame()
`summarise()` regrouping output by 'Month', 'Year' (override with `.groups` argument)
  Month Year DOW       Avg
1   Feb 2019 Fri 0.4027778
2   Feb 2019 Sat 0.5972222
3   Jan 2019 Fri 0.4819915
4   Jan 2019 Sat 0.5180085
>

使用的数据：

structure(list(ID = c(1, 1, 1, 1, 2, 2, 2, 2), Month = c("Jan", 
"Jan", "Feb", "Feb", "Jan", "Jan", "Feb", "Feb"), Year = c(2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019), DOW = c("Fri", "Sat", 
"Fri", "Sat", "Fri", "Sat", "Fri", "Sat"), Value = c(20, 39, 
30, 24, 20, 12, 1, 3)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -8L), spec = structure(list(
    cols = list(ID = structure(list(), class = c("collector_double", 
    "collector")), Month = structure(list(), class = c("collector_character", 
    "collector")), Year = structure(list(), class = c("collector_double", 
    "collector")), DOW = structure(list(), class = c("collector_character", 
    "collector")), Value = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

使用

base

df$Percent <- with(df, ave(Value, list(ID, Month), FUN = prop.table))

  ID Month Year DOW Value   Percent
1  1   Jan 2019 Fri    20 0.3389831
2  1   Jan 2019 Sat    39 0.6610169
3  1   Feb 2019 Fri    30 0.5555556
4  1   Feb 2019 Sat    24 0.4444444
5  2   Jan 2019 Fri    20 0.6250000
6  2   Jan 2019 Sat    12 0.3750000
7  2   Feb 2019 Fri     1 0.2500000
8  2   Feb 2019 Sat     3 0.7500000

df$Percent为什么你的第一组不考虑ID？@JohnThomas，因为你说“我的目标是确定每一行相对于月份的百分比；”并且没有提到ID。我错误地添加了道琼斯指数，但更仔细地看，似乎需要添加ID
，而只是添加而已。