替换R中由组分隔的类别的中值
在我的数据集中替换R中由组分隔的类别的中值,r,dplyr,data.table,plyr,R,Dplyr,Data.table,Plyr,在我的数据集中 mydat=structure(list(code = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("25480МСК", "25481МСК"), class = "factor"), item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13164L, 13164L, 13164L, 13164
mydat=structure(list(code = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("25480МСК", "25481МСК"), class = "factor"),
item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L,
13164L, 13164L, 13164L, 13164L, 13164L, 13164L), sales = c(1L,
2L, 15L, 1L, 4L, 3L, 3L, 3L, 15L, 4L, 4L, 4L), action = c(0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L)), .Names = c("code",
"item", "sales", "action"), class = "data.frame", row.names = c(NA,
-12L))
我有2个组变量代码+项。这里有两组:
25481МСК 13163
25480МСК 13164
我还有行动专栏。它只能有两个值零(0)或一(1)。
我需要计算“按行动销售”的中位数=0,然后用该中位数替换“按行动销售”的所有中位数(1)。
每一组都必须单独进行
即期望输出
code item sales action output
25481МСК 13163 1 0 1
25481МСК 13163 2 0 2
25481МСК 13163 15 1 2
25481МСК 13163 1 0 1
25481МСК 13163 4 0 4
25481МСК 13163 3 0 3
25480МСК 13164 3 0 3
25480МСК 13164 3 0 3
25480МСК 13164 15 1 4
25480МСК 13164 4 0 4
25480МСК 13164 4 0 4
25480МСК 13164 4 0 4
25481МСК13163
group=2和action 1=15中的销售行动中位数为零,因此我们在2上替换action 1=15
请注意,action=0的sales列的值也应该在output列中。
如何执行它?librar(dplyr)
librar(dplyr)
mydat %>% group_by(code,item) %>%
mutate(output=ifelse(action==0,sales,median(sales[action==0],na.rm = TRUE)))
# A tibble: 12 x 5
# Groups: code, item [2]
code item sales action output
<fct> <int> <int> <int> <int>
1 25481МСК 13163 1 0 1
2 25481МСК 13163 2 0 2
3 25481МСК 13163 15 1 2
4 25481МСК 13163 1 0 1
5 25481МСК 13163 4 0 4
6 25481МСК 13163 3 0 3
7 25480МСК 13164 3 0 3
8 25480МСК 13164 3 0 3
9 25480МСК 13164 15 1 4
10 25480МСК 13164 4 0 4
11 25480МСК 13164 4 0 4
12 25480МСК 13164 4 0 4
mydat%%>%分组依据(代码、项目)%%>%
变异(输出=ifelse(动作=0,销售额,中位数(销售额[action==0],na.rm=TRUE)))
#一个tibble:12x5
#分组:代码,项目[2]
代码项销售操作输出
1 25481МСК 13163 1 0 1
2 25481МСК 13163 2 0 2
3 25481МСК 13163 15 1 2
4 25481МСК 13163 1 0 1
5 25481МСК 13163 4 0 4
6 25481МСК 13163 3 0 3
7 25480МСК 13164 3 0 3
8 25480МСК 13164 3 0 3
9 25480МСК 13164 15 1 4
10 25480МСК 13164 4 0 4
11 25480МСК 13164 4 0 4
12 25480МСК 13164 4 0 4
为了完整起见,下面是另一种使用更新连接的方法:
你能更清楚地回答这个问题吗。无法理解第2组的输出中值为4。@Hunaidkhan,我提供了错误的期望输出。请检查editin输出,我们将所有销售值按action=0 m保留,但销售的action=1必须由中位数替换。是的,它得到所需的输出
replace
是另一个选项:mydat[,v:=replace(sales,action==1,median(sales[action==0]),by=.(code,item)]
谢谢@Frank。我不熟悉replace
,但在这种情况下听起来效率更高。
library(data.table)
setDT(mydat)
mydat[,
output := ifelse(action, median(sales[!action]), sales),
by = .(code, item)]
code item sales action output
1: 25481MCK 13163 1 0 1
2: 25481MCK 13163 2 0 2
3: 25481MCK 13163 15 1 2
4: 25481MCK 13163 1 0 1
5: 25481MCK 13163 4 0 4
6: 25481MCK 13163 3 0 3
7: 25480MCK 13164 3 0 3
8: 25480MCK 13164 3 0 3
9: 25480MCK 13164 15 1 4
10: 25480MCK 13164 4 0 4
11: 25480MCK 13164 4 0 4
12: 25480MCK 13164 4 0 4
library(data.table)
# compute medians for each group
med <- setDT(mydat)[action == 0L, median(sales), by = .(code, item)][
# append column to pick only rows with action == 1L in join
, action := 1L]
mydat[
# copy sales to output column, thereby coercing to double to match value of median()
, output := as.numeric(sales)][
# join and update selectively
med, on = .(code, item, action), output := V1]
mydat[]
code item sales action output
1: 25481MCK 13163 1 0 1
2: 25481MCK 13163 2 0 2
3: 25481MCK 13163 15 1 2
4: 25481MCK 13163 1 0 1
5: 25481MCK 13163 4 0 4
6: 25481MCK 13163 3 0 3
7: 25480MCK 13164 3 0 3
8: 25480MCK 13164 3 0 3
9: 25480MCK 13164 15 1 4
10: 25480MCK 13164 4 0 4
11: 25480MCK 13164 4 0 4
12: 25480MCK 13164 4 0 4