替换R中由组分隔的类别的中值_R_Dplyr_Data.table_Plyr

替换R中由组分隔的类别的中值

替换R中由组分隔的类别的中值,r,dplyr,data.table,plyr,R,Dplyr,Data.table,Plyr,在我的数据集中 mydat=structure(list(code = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("25480МСК", "25481МСК"), class = "factor"), item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13164L, 13164L, 13164L, 13164

在我的数据集中

 mydat=structure(list(code = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("25480МСК", "25481МСК"), class = "factor"), 
    item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 
    13164L, 13164L, 13164L, 13164L, 13164L, 13164L), sales = c(1L, 
    2L, 15L, 1L, 4L, 3L, 3L, 3L, 15L, 4L, 4L, 4L), action = c(0L, 
    0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L)), .Names = c("code", 
"item", "sales", "action"), class = "data.frame", row.names = c(NA, 
-12L))

我有2个组变量代码+项。这里有两组：

25481МСК    13163
25480МСК    13164

我还有行动专栏。它只能有两个值零（0）或一（1）。我需要计算“按行动销售”的中位数=0，然后用该中位数替换“按行动销售”的所有中位数（1）。每一组都必须单独进行

即期望输出

code    item    sales   action  output
25481МСК    13163   1   0        1
25481МСК    13163   2   0        2
25481МСК    13163   15  1        2
25481МСК    13163   1   0        1
25481МСК    13163   4   0        4
25481МСК    13163   3   0        3
25480МСК    13164   3   0        3
25480МСК    13164   3   0        3
25480МСК    13164   15  1        4
25480МСК    13164   4   0        4
25480МСК    13164   4   0        4
25480МСК    13164   4   0        4

25481МСК13163

group=2和action 1=15中的销售行动中位数为零，因此我们在2上替换action 1=15

请注意，action=0的sales列的值也应该在output列中。如何执行它？

librar（dplyr）
librar(dplyr)
mydat %>% group_by(code,item) %>% 
          mutate(output=ifelse(action==0,sales,median(sales[action==0],na.rm = TRUE))) 


# A tibble: 12 x 5
  # Groups:   code, item [2]
  code      item sales action output
  <fct>    <int> <int>  <int>  <int>
  1 25481МСК 13163     1      0      1
  2 25481МСК 13163     2      0      2
  3 25481МСК 13163    15      1      2
  4 25481МСК 13163     1      0      1
  5 25481МСК 13163     4      0      4
  6 25481МСК 13163     3      0      3
  7 25480МСК 13164     3      0      3
  8 25480МСК 13164     3      0      3
  9 25480МСК 13164    15      1      4
  10 25480МСК 13164     4      0      4
  11 25480МСК 13164     4      0      4
  12 25480МСК 13164     4      0      4

mydat%%>%分组依据（代码、项目）%%>%
变异（输出=ifelse（动作=0，销售额，中位数（销售额[action==0]，na.rm=TRUE）））
#一个tibble:12x5
#分组：代码，项目[2]
代码项销售操作输出
1 25481МСК 13163     1      0      1
2 25481МСК 13163     2      0      2
3 25481МСК 13163    15      1      2
4 25481МСК 13163     1      0      1
5 25481МСК 13163     4      0      4
6 25481МСК 13163     3      0      3
7 25480МСК 13164     3      0      3
8 25480МСК 13164     3      0      3
9 25480МСК 13164    15      1      4
10 25480МСК 13164     4      0      4
11 25480МСК 13164     4      0      4
12 25480МСК 13164     4      0      4

为了完整起见，下面是另一种使用更新连接的方法：

你能更清楚地回答这个问题吗。无法理解第2组的输出中值为4。@Hunaidkhan，我提供了错误的期望输出。请检查editin输出，我们将所有销售值按action=0 m保留，但销售的action=1必须由中位数替换。是的，它得到所需的输出

replace

是另一个选项：

mydat[，v:=replace（sales，action==1，median（sales[action==0]），by=.（code，item）]

谢谢@Frank。我不熟悉

replace

，但在这种情况下听起来效率更高。

library(data.table)
setDT(mydat)
mydat[, 
      output := ifelse(action, median(sales[!action]), sales), 
      by = .(code, item)]

        code  item sales action output
 1: 25481MCK 13163     1      0      1
 2: 25481MCK 13163     2      0      2
 3: 25481MCK 13163    15      1      2
 4: 25481MCK 13163     1      0      1
 5: 25481MCK 13163     4      0      4
 6: 25481MCK 13163     3      0      3
 7: 25480MCK 13164     3      0      3
 8: 25480MCK 13164     3      0      3
 9: 25480MCK 13164    15      1      4
10: 25480MCK 13164     4      0      4
11: 25480MCK 13164     4      0      4
12: 25480MCK 13164     4      0      4

library(data.table)
# compute medians for each group
med <- setDT(mydat)[action == 0L, median(sales), by = .(code, item)][
  # append column to pick only rows with action == 1L in join
  , action := 1L]
mydat[
  # copy sales to output column, thereby coercing to double to match value of median()
  , output := as.numeric(sales)][
    # join and update selectively
    med, on = .(code, item, action), output := V1]
mydat[]

        code  item sales action output
 1: 25481MCK 13163     1      0      1
 2: 25481MCK 13163     2      0      2
 3: 25481MCK 13163    15      1      2
 4: 25481MCK 13163     1      0      1
 5: 25481MCK 13163     4      0      4
 6: 25481MCK 13163     3      0      3
 7: 25480MCK 13164     3      0      3
 8: 25480MCK 13164     3      0      3
 9: 25480MCK 13164    15      1      4
10: 25480MCK 13164     4      0      4
11: 25480MCK 13164     4      0      4
12: 25480MCK 13164     4      0      4