组合场景以在R中用组替换中间层
我有数据集组合场景以在R中用组替换中间层,r,dplyr,data.table,lapply,R,Dplyr,Data.table,Lapply,我有数据集 mydat <- structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("52382MCK", "52499MCK"), class = "factor")
mydat <-
structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("52382MCK",
"52499MCK"), class = "factor"), item = c(11709L, 11709L, 11709L,
11709L, 11708L, 11708L, 11708L, 11710L, 11710L, 11710L, 11710L,
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L,
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L,
11710L, 11202L, 11203L, 11203L, 11204L, 11204L, 11205L, 11205L
), sales = c(30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L,
15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L,
20L, 15L, 2L, 10L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), action = c(0L,
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
1L, 1L)), row.names = c(NA, -35L), class = "data.frame")
# coerce to data.table
setDT(mydat)
所以有行动专栏。它只能有两个值零(0)或一(1)
场景是第一类操作之前的零类操作数和一类操作之后的零类操作数
For example
52382МСК 11709
当我们在第一类动作列之前有一个零类动作列,在第一类动作列之后有两个零。注意:当我们在第一类动作列之前有两个零类动作列,在第一类动作列之后有一个零时,可能会出现这种情况
mydat1
code item sales action
52382МСК 11709 30 0
52382МСК 11709 10 1
52382МСК 11709 20 0
52382МСК 11709 15 0
为了检测这个场景,我使用这个脚本/
这个脚本非常好用,谢谢@Uwe
library(data.table)
library(magrittr)
max_zeros <- 3
zeros <- sapply(0:max_zeros, stringr::str_dup, string = "0")
names(zeros) <- as.character(nchar(zeros))
sc <- CJ(zeros.before = zeros, zeros.after = zeros)[
, scenario.name := paste(nchar(zeros.before), nchar(zeros.after), sep = "-")][
, action.pattern := sprintf("%s1+(?=%s)", zeros.before, zeros.after)][]
# special case: all zero
sc0 <- data.table(
zeros.before = NA,
zeros.after = NA,
scenario.name = "no1",
action.pattern = "^0+$")
sc <- rbind(sc0, sc)
对于每个场景,计算零类别的中值
我需要计算中间值,按行动列按1个前零分类,即在行动列的一个类别之前,按行动列按2个前零分类,在行动列的一个类别之后。
仅对第一类操作列执行中间值替换
按销售额列。
如果中位数大于销售额,则不要替换它
要做到这一点,我使用脚本
sales_action <- function(DF, zeros_before, zeros_after) {
library(data.table)
library(magrittr)
action_pattern <-
do.call(sprintf,
c(fmt = "%s1+(?=%s)",
stringr::str_dup("0", c(zeros_before, zeros_after)) %>% as.list()
))
message("Action pattern used: ", action_pattern)
setDT(DF)[, rn := .I]
tmp <- DF[, paste(action, collapse = "") %>%
stringr::str_locate_all(action_pattern) %>%
as.data.table() %>%
lapply(function(x) rn[x]),
by = .(code, item)][
, end := end + zeros_after]
DF[tmp, on = .(code, item, rn >= start, rn <= end),
med := as.double(median(sales[action == 0])), by = .EACHI][
, output := as.double(sales)][action == 1, output := pmin(sales, med)][
, c("rn", "med") := NULL][]
}
所以我得到了结果
这个问题基于以下几点
每次我都必须手动输入场景以替换为中间值
sales_action(mydat, 1L, 2L)
sales_action(mydat, 3L, 1L)
sales_action(mydat, 2L, 2L)
等等
如何为所有可能的场景自动执行替换中值
这样我就不会每次都写了
销售行动(mydat、.L、.L)
所以,输出的例子
code i tem sales action output pattern
52382MCK 11709 30 0 30 01+00
52382MCK 11709 10 1 10 01+00
52382MCK 11709 20 0 20 01+00
52382MCK 11709 15 0 15 01+00
52382MCK 1170 8 0 8 01+00
52382MCK 1170 10 1 8 01+00
52382MCK 1170 2 0 2 01+00
52382MCK 1170 15 0 15 01+00
如果我理解正确,OP希望通过将行动期间的销售额
数字与销售行动前后期间的销售额中值进行比较,来分析销售行动的成功
有一些挑战:
代码
,项目
组可能有多个销售操作- 标识每个
,代码
组内的销售活动项目
- 在每个销售操作之前最多拾取三行零操作行,在每个销售操作之后最多拾取三行零操作行
- 计算这些行的销售中值,以及
- 如果销售行动中的销售数字超过周围零行动行的中位数,则更新输出
行动==1L
)和前后的零行动期间
library(data.table)
# coerce to data.table and create categories
setDT(mydat)[, cat := rleid(action), by = .(code, item)][]
# extract action categories, identify preceeding & succeeding zero action categories
mycat <- mydat[, .(action = first(action)), by = .(code, item, cat = cat)][
, `:=`(before = cat - 1L, after = cat + 1L)][action == 1L]
mycat
code item cat action before after
1: 52382MCK 11709 2 1 1 3
2: 52382MCK 11708 2 1 1 3
3: 52382MCK 11710 2 1 1 3
4: 52382MCK 11710 4 1 3 5
5: 52382MCK 11710 6 1 5 7
6: 52499MCK 11203 2 1 1 3
7: 52499MCK 11205 1 1 0 2
编辑:或者,对pmin()
的调用可以替换为非相等联接,该联接只更新销售额超过中位数的行:
# prepare result, alternative approach
mydat[, output := as.double(sales)][
# non-equi update join
action_cat_median, on = .(code, item, cat, output > med), output := med]
mydat
以下行已更新:
mydat[output != sales]
我正在努力从你的帖子中提取关键信息,因为这里有很多文本/代码。如果你能精简你的文章,让它更简洁,并且只保留你代码中关键的相关部分,那会有帮助。@Mauritservers,是的,有很多信息,因为这篇文章是我以前的文章的结果,但我们帮了我。从今以后,我将尽量简洁,代码非常好而且干净。一些补充。如何在单独的数据集中分离方案3L-3L?因此,它将包含两个数据集1。没有3l3l和2的所有场景。仅适用于场景3l的数据集-3l@D.Joe非常感谢。请问,您能否更详细地指定应该将哪些行移动到每个子集,特别是对于边缘情况?组的所有行
52382MCK、11710
都属于3L-3L场景(000100000000001000
)。但是关于000100001000
、0000100000000
或00010000010
的内容是什么?例如,对不起,我的internet连接有任何问题。无法上网。好了,你的问题呢。谢谢你的最深的分析技巧,所以我想,让我们来看看这些奇怪的案例,就像你发现的那样,放入新的单独的数据集中,我将在SQL中考虑如何处理这些销售。是否可以将奇怪的情况放在一个单独的数据集中?@D.Joe,我已经更新了我的答案,以展示如何将mydat
拆分为每个场景的单独数据集。如果这不符合您的要求,我建议发布一个新问题。那么字符串“sales_action(mydat,1L,2L)”和字符串“split(mydat[class,on=(code,item)],by=“scenario.name”)”都可以吗。我只想做到“销售行动(mydat,1L,2L)”只适用于3-3之外的拆分数据集。就好像R是一个人的销售行动(mydat,split(mydat[class,on=(code,item)],by=“scenario.name”),其中class不是3-3)怎么做?
code i tem sales action output pattern
52382MCK 11709 30 0 30 01+00
52382MCK 11709 10 1 10 01+00
52382MCK 11709 20 0 20 01+00
52382MCK 11709 15 0 15 01+00
52382MCK 1170 8 0 8 01+00
52382MCK 1170 10 1 8 01+00
52382MCK 1170 2 0 2 01+00
52382MCK 1170 15 0 15 01+00
library(data.table)
# coerce to data.table and create categories
setDT(mydat)[, cat := rleid(action), by = .(code, item)][]
# extract action categories, identify preceeding & succeeding zero action categories
mycat <- mydat[, .(action = first(action)), by = .(code, item, cat = cat)][
, `:=`(before = cat - 1L, after = cat + 1L)][action == 1L]
mycat
code item cat action before after
1: 52382MCK 11709 2 1 1 3
2: 52382MCK 11708 2 1 1 3
3: 52382MCK 11710 2 1 1 3
4: 52382MCK 11710 4 1 3 5
5: 52382MCK 11710 6 1 5 7
6: 52499MCK 11203 2 1 1 3
7: 52499MCK 11205 1 1 0 2
# compute median of surrouding zero action categories
action_cat_median <-
rbind(
# get sales from up to 3 zero action rows before action category
mydat[mycat, on = .(code, item, cat = before),
.(sales = tail(sales, 3), i.cat), by =.EACHI],
# get sales from up to 3 zero action rows after action category
mydat[mycat, on = .(code, item, cat = after),
.(sales = head(sales, 3), i.cat), by =.EACHI]
)[
# remove empty groups
!is.na(sales)][
# compute median for each action category
, .(med = as.double(median(sales))), by = .(code, item, cat = i.cat)]
action_cat_median
code item cat med
1: 52382MCK 11709 2 20.0
2: 52382MCK 11708 2 2.5
3: 52382MCK 11710 2 10.0
4: 52382MCK 11710 4 10.0
5: 52382MCK 11710 6 10.0
6: 52499MCK 11203 2 2.0
# prepare result
mydat[, output := as.double(sales)][
# update join
action_cat_median, on = .(code, item, cat), output := pmin(sales, med)]
# prepare result, alternative approach
mydat[, output := as.double(sales)][
# non-equi update join
action_cat_median, on = .(code, item, cat, output > med), output := med]
mydat
code item sales action cat output
1: 52382MCK 11709 30 0 1 30.0
2: 52382MCK 11709 10 1 2 10.0
3: 52382MCK 11709 20 0 3 20.0
4: 52382MCK 11709 15 0 3 15.0
5: 52382MCK 11708 2 0 1 2.0
6: 52382MCK 11708 10 1 2 2.5
7: 52382MCK 11708 3 0 3 3.0
8: 52382MCK 11710 30 0 1 30.0
9: 52382MCK 11710 10 0 1 10.0
10: 52382MCK 11710 20 0 1 20.0
11: 52382MCK 11710 15 1 2 10.0
12: 52382MCK 11710 2 0 3 2.0
13: 52382MCK 11710 10 0 3 10.0
14: 52382MCK 11710 3 0 3 3.0
15: 52382MCK 11710 30 0 3 30.0
16: 52382MCK 11710 10 0 3 10.0
17: 52382MCK 11710 20 0 3 20.0
18: 52382MCK 11710 15 1 4 10.0
19: 52382MCK 11710 2 0 5 2.0
20: 52382MCK 11710 10 0 5 10.0
21: 52382MCK 11710 3 0 5 3.0
22: 52382MCK 11710 30 0 5 30.0
23: 52382MCK 11710 10 0 5 10.0
24: 52382MCK 11710 20 0 5 20.0
25: 52382MCK 11710 15 1 6 10.0
26: 52382MCK 11710 2 0 7 2.0
27: 52382MCK 11710 10 0 7 10.0
28: 52382MCK 11710 3 0 7 3.0
29: 52499MCK 11202 2 0 1 2.0
30: 52499MCK 11203 2 0 1 2.0
31: 52499MCK 11203 2 1 2 2.0
32: 52499MCK 11204 2 0 1 2.0
33: 52499MCK 11204 2 0 1 2.0
34: 52499MCK 11205 2 1 1 2.0
35: 52499MCK 11205 2 1 1 2.0
code item sales action cat output
mydat[output != sales]
code item sales action cat output
1: 52382MCK 11708 10 1 2 2.5
2: 52382MCK 11710 15 1 2 10.0
3: 52382MCK 11710 15 1 4 10.0
4: 52382MCK 11710 15 1 6 10.0