按类别分组，然后找出类别之间的差异[r]_R_Statistics

按类别分组，然后找出类别之间的差异[r]

r statistics

按类别分组，然后找出类别之间的差异[r],r,statistics,R,Statistics,我正在计算1995年至2015年不同群体的平均就业率。然后计算各组平均就业率的差异这应该每年订购一次大多数时候，我尝试在dplyr中使用summary函数，但失败了下面的代码是我设置的 diff_in_diff% 过滤器（年龄>=19岁和年龄% 变异（女性和黑人男性=ifelse（女性=1&marstat！=1&nfchild==0，“单身无子女”， ifelse（女性==1&marstat！=1&nfchild>0，“有孩子的单身”， ifelse（女性==1&marstat==1&nf

我正在计算1995年至2015年不同群体的平均就业率。然后计算各组平均就业率的差异

这应该每年订购一次

大多数时候，我尝试在dplyr中使用summary函数，但失败了

下面的代码是我设置的

diff_in_diff%
过滤器（年龄>=19岁和年龄%
变异（女性和黑人男性=ifelse（女性=1&marstat！=1&nfchild==0，“单身无子女”，
ifelse（女性==1&marstat！=1&nfchild>0，“有孩子的单身”，
ifelse（女性==1&marstat==1&nfchild==0，“已婚无子”，
ifelse（女性==1&marstat==1&nfchild>0，“已婚有子女”，
ifelse（女性==0&wbhao==2，“黑人男性”，“其他男性”()()())())()()
差异2%
过滤器（！is.na（emp））%>%
按年龄分组（女性和黑人男性）%>%
汇总（平均值=平均值）

这就是我发现的

然而，我想找出

有孩子的单身者减去黑人男性

，

有孩子的单身者减去没有孩子的单身者

，

有孩子的单身者减去有孩子的已婚者

，

有孩子的单身者减去没有孩子的已婚者

和

有孩子的单身者减去其他男性

因此，我的期望是：

year |  Single_with_children_vs      |      diff_in_diff

1995 |  vs_Married with children     |      0.031230201
1995 |  vs Married without children  |     -0.130002012
1995 |  vs Single_without_children   |     -0.190230201
1995 |  vs Black Men                 |      0.002030210
1996 |
.
.
.

类似这样的东西。

也许不是最优雅的解决方案，但这里有一个快速解决方案：

    # I created a basic dataset similar to yours
    diff_in_diff <- data.frame(year=rep(1995:1996,8)
                        , women_and_black_men = rep(c("married with children", "married 
  without children", "otherwise men", "single with children", "single without children", "black men", "married with children", "otherwise men"), 2)
                        , empl = abs(rnorm(16, 0, 0.5))

    ) %>% arrange(year)


    # create a dataframe that is just single with children
      diff_in_diff_single <- diff_in_diff %>% 
      filter(women_and_black_men == "single with children") %>% 
      dplyr::rename("single.emp" = empl)

     # join with our original dataframe and take the difference
     diff_in_diff %>% 
     full_join(diff_in_diff_single, by = c("year")) %>% 
     drop_na() %>% 
     group_by(year, women_and_black_men.x) %>% 
     mutate(diff = empl - single.emp)

#我创建了一个与您类似的基本数据集
差异百分比排列（年）
#创建一个仅包含子项的数据帧
差异单位为差异单位%
过滤器（女性和黑人男性==“有孩子的单身”）%>%
dplyr:：rename（“single.emp”=emp）
#加入我们的原始数据框架，并接受差异
差异在差异%>%
完全联接（不同于单个，按=c（“年”））%>%
下拉菜单（）%>%
按年龄分组（女性和黑人男性，x）%>%
变异（diff=emp-single.emp）

非常感谢你，雅各布。这已经足够清楚地说明了正确的道路：）@GirimBan太棒了！是的，看起来你已经掌握了足够的代码来从中构建。请随时检查此问题是否已解决。

    # I created a basic dataset similar to yours
    diff_in_diff <- data.frame(year=rep(1995:1996,8)
                        , women_and_black_men = rep(c("married with children", "married 
  without children", "otherwise men", "single with children", "single without children", "black men", "married with children", "otherwise men"), 2)
                        , empl = abs(rnorm(16, 0, 0.5))

    ) %>% arrange(year)


    # create a dataframe that is just single with children
      diff_in_diff_single <- diff_in_diff %>% 
      filter(women_and_black_men == "single with children") %>% 
      dplyr::rename("single.emp" = empl)

     # join with our original dataframe and take the difference
     diff_in_diff %>% 
     full_join(diff_in_diff_single, by = c("year")) %>% 
     drop_na() %>% 
     group_by(year, women_and_black_men.x) %>% 
     mutate(diff = empl - single.emp)