按类别分组,然后找出类别之间的差异[r]

按类别分组,然后找出类别之间的差异[r],r,statistics,R,Statistics,我正在计算1995年至2015年不同群体的平均就业率。然后计算各组平均就业率的差异 这应该每年订购一次 大多数时候,我尝试在dplyr中使用summary函数,但失败了 下面的代码是我设置的 diff_in_diff% 过滤器(年龄>=19岁和年龄% 变异(女性和黑人男性=ifelse(女性=1&marstat!=1&nfchild==0,“单身无子女”, ifelse(女性==1&marstat!=1&nfchild>0,“有孩子的单身”, ifelse(女性==1&marstat==1&nf

我正在计算1995年至2015年不同群体的平均就业率。然后计算各组平均就业率的差异

这应该每年订购一次

大多数时候,我尝试在dplyr中使用summary函数,但失败了

下面的代码是我设置的

diff_in_diff%
过滤器(年龄>=19岁和年龄%
变异(女性和黑人男性=ifelse(女性=1&marstat!=1&nfchild==0,“单身无子女”,
ifelse(女性==1&marstat!=1&nfchild>0,“有孩子的单身”,
ifelse(女性==1&marstat==1&nfchild==0,“已婚无子”,
ifelse(女性==1&marstat==1&nfchild>0,“已婚有子女”,
ifelse(女性==0&wbhao==2,“黑人男性”,“其他男性”()()())())()()
差异2%
过滤器(!is.na(emp))%>%
按年龄分组(女性和黑人男性)%>%
汇总(平均值=平均值)
这就是我发现的

然而,我想找出
有孩子的单身者减去黑人男性
有孩子的单身者减去没有孩子的单身者
有孩子的单身者减去有孩子的已婚者
有孩子的单身者减去没有孩子的已婚者
有孩子的单身者减去其他男性

因此,我的期望是:

year |  Single_with_children_vs      |      diff_in_diff

1995 |  vs_Married with children     |      0.031230201
1995 |  vs Married without children  |     -0.130002012
1995 |  vs Single_without_children   |     -0.190230201
1995 |  vs Black Men                 |      0.002030210
1996 |
.
.
.

类似这样的东西。

也许不是最优雅的解决方案,但这里有一个快速解决方案:

    # I created a basic dataset similar to yours
    diff_in_diff <- data.frame(year=rep(1995:1996,8)
                        , women_and_black_men = rep(c("married with children", "married 
  without children", "otherwise men", "single with children", "single without children", "black men", "married with children", "otherwise men"), 2)
                        , empl = abs(rnorm(16, 0, 0.5))

    ) %>% arrange(year)


    # create a dataframe that is just single with children
      diff_in_diff_single <- diff_in_diff %>% 
      filter(women_and_black_men == "single with children") %>% 
      dplyr::rename("single.emp" = empl)

     # join with our original dataframe and take the difference
     diff_in_diff %>% 
     full_join(diff_in_diff_single, by = c("year")) %>% 
     drop_na() %>% 
     group_by(year, women_and_black_men.x) %>% 
     mutate(diff = empl - single.emp)
#我创建了一个与您类似的基本数据集
差异百分比排列(年)
#创建一个仅包含子项的数据帧
差异单位为差异单位%
过滤器(女性和黑人男性==“有孩子的单身”)%>%
dplyr::rename(“single.emp”=emp)
#加入我们的原始数据框架,并接受差异
差异在差异%>%
完全联接(不同于单个,按=c(“年”))%>%
下拉菜单()%>%
按年龄分组(女性和黑人男性,x)%>%
变异(diff=emp-single.emp)

非常感谢你,雅各布。这已经足够清楚地说明了正确的道路:)@GirimBan太棒了!是的,看起来你已经掌握了足够的代码来从中构建。请随时检查此问题是否已解决。
    # I created a basic dataset similar to yours
    diff_in_diff <- data.frame(year=rep(1995:1996,8)
                        , women_and_black_men = rep(c("married with children", "married 
  without children", "otherwise men", "single with children", "single without children", "black men", "married with children", "otherwise men"), 2)
                        , empl = abs(rnorm(16, 0, 0.5))

    ) %>% arrange(year)


    # create a dataframe that is just single with children
      diff_in_diff_single <- diff_in_diff %>% 
      filter(women_and_black_men == "single with children") %>% 
      dplyr::rename("single.emp" = empl)

     # join with our original dataframe and take the difference
     diff_in_diff %>% 
     full_join(diff_in_diff_single, by = c("year")) %>% 
     drop_na() %>% 
     group_by(year, women_and_black_men.x) %>% 
     mutate(diff = empl - single.emp)