组合场景以在R中用组替换中间层_R_Dplyr_Data.table_Lapply

组合场景以在R中用组替换中间层

组合场景以在R中用组替换中间层,r,dplyr,data.table,lapply,R,Dplyr,Data.table,Lapply,我有数据集 mydat <- structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("52382MCK", "52499MCK"), class = "factor")

我有数据集

mydat <- 
structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("52382MCK", 
"52499MCK"), class = "factor"), item = c(11709L, 11709L, 11709L, 
11709L, 11708L, 11708L, 11708L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11202L, 11203L, 11203L, 11204L, 11204L, 11205L, 11205L
), sales = c(30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 
15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 
20L, 15L, 2L, 10L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), action = c(0L, 
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
1L, 1L)), row.names = c(NA, -35L), class = "data.frame")
# coerce to data.table
setDT(mydat)

所以有行动专栏。它只能有两个值零（0）或一（1）

场景是第一类操作之前的零类操作数和一类操作之后的零类操作数

For example
52382МСК    11709

当我们在第一类动作列之前有一个零类动作列，在第一类动作列之后有两个零。注意：当我们在第一类动作列之前有两个零类动作列，在第一类动作列之后有一个零时，可能会出现这种情况

mydat1

code    item    sales   action
52382МСК    11709   30  0
52382МСК    11709   10  1
52382МСК    11709   20  0
52382МСК    11709   15  0

为了检测这个场景，我使用这个脚本/ 这个脚本非常好用，谢谢@Uwe

library(data.table)
library(magrittr)

max_zeros <- 3
zeros <- sapply(0:max_zeros, stringr::str_dup, string = "0")
names(zeros) <- as.character(nchar(zeros))
sc <- CJ(zeros.before = zeros, zeros.after = zeros)[
  , scenario.name := paste(nchar(zeros.before), nchar(zeros.after), sep = "-")][
    , action.pattern := sprintf("%s1+(?=%s)", zeros.before, zeros.after)][]
# special case: all zero
sc0 <- data.table(
  zeros.before = NA,
  zeros.after = NA, 
  scenario.name = "no1", 
  action.pattern = "^0+$")
sc <- rbind(sc0, sc)

对于每个场景，计算零类别的中值

我需要计算中间值，按行动列按1个前零分类，即在行动列的一个类别之前，按行动列按2个前零分类，在行动列的一个类别之后。仅对第一类操作列执行中间值替换按销售额列。如果中位数大于销售额，则不要替换它

要做到这一点，我使用脚本

sales_action <- function(DF, zeros_before, zeros_after) {
  library(data.table)
  library(magrittr)
  action_pattern <- 
    do.call(sprintf, 
            c(fmt = "%s1+(?=%s)", 
              stringr::str_dup("0", c(zeros_before, zeros_after)) %>% as.list()
            ))
  message("Action pattern used: ", action_pattern)
  setDT(DF)[, rn := .I]
  tmp <- DF[, paste(action, collapse = "") %>% 
              stringr::str_locate_all(action_pattern) %>% 
              as.data.table() %>% 
              lapply(function(x) rn[x]),
            by = .(code, item)][
              , end := end + zeros_after]
  DF[tmp, on = .(code, item, rn >= start, rn <= end), 
     med := as.double(median(sales[action == 0])), by = .EACHI][
       , output := as.double(sales)][action == 1, output := pmin(sales, med)][
         , c("rn", "med") := NULL][]
}

所以我得到了结果

这个问题基于以下几点每次我都必须手动输入场景以替换为中间值

sales_action(mydat, 1L, 2L)
sales_action(mydat, 3L, 1L)
sales_action(mydat, 2L, 2L)

等等

如何为所有可能的场景自动执行替换中值这样我就不会每次都写了销售行动（mydat、.L、.L）

所以，输出的例子

code    i    tem    sales   action  output  pattern
52382MCK    11709   30        0       30    01+00
52382MCK    11709   10        1       10    01+00
52382MCK    11709   20        0       20    01+00
52382MCK    11709   15        0       15    01+00
52382MCK    1170    8         0        8    01+00
52382MCK    1170    10        1        8    01+00
52382MCK    1170    2         0        2    01+00
52382MCK    1170    15        0        15   01+00

如果我理解正确，OP希望通过将行动期间的

销售额

数字与销售行动前后期间的销售额中值进行比较，来分析销售行动的成功

有一些挑战：

每个

代码

，

项目

组可能有多个销售操作

可用数据可能涵盖销售行动前后三天内的少于要求的3天

IMHO，场景的引入是处理问题2的一个迂回过程

下面的方法

标识每个
```
代码
```
，
```
项目
```
组内的销售活动
在每个销售操作之前最多拾取三行零操作行，在每个销售操作之后最多拾取三行零操作行
计算这些行的销售中值，以及
如果销售行动中的销售数字超过周围零行动行的中位数，则更新输出

OP创造了“类别”一词，以区分销售行动期间（连续的

行动==1L

）和前后的零行动期间

library(data.table)
# coerce to data.table and create categories
setDT(mydat)[, cat := rleid(action), by = .(code, item)][]

# extract action categories, identify preceeding & succeeding zero action categories
mycat <- mydat[, .(action = first(action)), by = .(code, item, cat = cat)][
  , `:=`(before = cat - 1L, after = cat + 1L)][action == 1L]

mycat

       code  item cat action before after
1: 52382MCK 11709   2      1      1     3
2: 52382MCK 11708   2      1      1     3
3: 52382MCK 11710   2      1      1     3
4: 52382MCK 11710   4      1      3     5
5: 52382MCK 11710   6      1      5     7
6: 52499MCK 11203   2      1      1     3
7: 52499MCK 11205   1      1      0     2

编辑：或者，对

pmin（）

的调用可以替换为非相等联接，该联接只更新销售额超过中位数的行：

# prepare result, alternative approach
mydat[, output := as.double(sales)][
  # non-equi update join
  action_cat_median, on = .(code, item, cat, output > med), output := med]


mydat

以下行已更新：

mydat[output != sales]

我正在努力从你的帖子中提取关键信息，因为这里有很多文本/代码。如果你能精简你的文章，让它更简洁，并且只保留你代码中关键的相关部分，那会有帮助。@Mauritservers，是的，有很多信息，因为这篇文章是我以前的文章的结果，但我们帮了我。从今以后，我将尽量简洁，代码非常好而且干净。一些补充。如何在单独的数据集中分离方案3L-3L？因此，它将包含两个数据集1。没有3l3l和2的所有场景。仅适用于场景3l的数据集-3l@D.Joe非常感谢。请问，您能否更详细地指定应该将哪些行移动到每个子集，特别是对于边缘情况？组的所有行

52382MCK、11710

都属于3L-3L场景（

000100000000001000

）。但是关于

000100001000

、

0000100000000

或

00010000010

的内容是什么？例如，对不起，我的internet连接有任何问题。无法上网。好了，你的问题呢。谢谢你的最深的分析技巧，所以我想，让我们来看看这些奇怪的案例，就像你发现的那样，放入新的单独的数据集中，我将在SQL中考虑如何处理这些销售。是否可以将奇怪的情况放在一个单独的数据集中？@D.Joe，我已经更新了我的答案，以展示如何将

mydat

拆分为每个场景的单独数据集。如果这不符合您的要求，我建议发布一个新问题。那么字符串“sales_action（mydat，1L，2L）”和字符串“split（mydat[class，on=（code，item）]，by=“scenario.name”）”都可以吗。我只想做到“销售行动（mydat，1L，2L）”只适用于3-3之外的拆分数据集。就好像R是一个人的销售行动（mydat，split（mydat[class，on=（code，item）]，by=“scenario.name”），其中class不是3-3）怎么做？

code    i    tem    sales   action  output  pattern
52382MCK    11709   30        0       30    01+00
52382MCK    11709   10        1       10    01+00
52382MCK    11709   20        0       20    01+00
52382MCK    11709   15        0       15    01+00
52382MCK    1170    8         0        8    01+00
52382MCK    1170    10        1        8    01+00
52382MCK    1170    2         0        2    01+00
52382MCK    1170    15        0        15   01+00

library(data.table)
# coerce to data.table and create categories
setDT(mydat)[, cat := rleid(action), by = .(code, item)][]

# extract action categories, identify preceeding & succeeding zero action categories
mycat <- mydat[, .(action = first(action)), by = .(code, item, cat = cat)][
  , `:=`(before = cat - 1L, after = cat + 1L)][action == 1L]

mycat

       code  item cat action before after
1: 52382MCK 11709   2      1      1     3
2: 52382MCK 11708   2      1      1     3
3: 52382MCK 11710   2      1      1     3
4: 52382MCK 11710   4      1      3     5
5: 52382MCK 11710   6      1      5     7
6: 52499MCK 11203   2      1      1     3
7: 52499MCK 11205   1      1      0     2

# compute median of surrouding zero action categories
action_cat_median <- 
  rbind(
    # get sales from up to 3 zero action rows before action category
    mydat[mycat, on = .(code, item, cat = before), 
          .(sales = tail(sales, 3), i.cat), by =.EACHI],
    # get sales from up to 3 zero action rows after action category
    mydat[mycat, on = .(code, item, cat = after), 
          .(sales = head(sales, 3), i.cat), by =.EACHI]
  )[
    # remove empty groups
    !is.na(sales)][
      # compute median for each action category
      , .(med = as.double(median(sales))), by = .(code, item, cat = i.cat)]

action_cat_median

       code  item cat  med
1: 52382MCK 11709   2 20.0
2: 52382MCK 11708   2  2.5
3: 52382MCK 11710   2 10.0
4: 52382MCK 11710   4 10.0
5: 52382MCK 11710   6 10.0
6: 52499MCK 11203   2  2.0

# prepare result
mydat[, output := as.double(sales)][
  # update join
  action_cat_median, on = .(code, item, cat), output := pmin(sales, med)]

# prepare result, alternative approach
mydat[, output := as.double(sales)][
  # non-equi update join
  action_cat_median, on = .(code, item, cat, output > med), output := med]


mydat

        code  item sales action cat output
 1: 52382MCK 11709    30      0   1   30.0
 2: 52382MCK 11709    10      1   2   10.0
 3: 52382MCK 11709    20      0   3   20.0
 4: 52382MCK 11709    15      0   3   15.0
 5: 52382MCK 11708     2      0   1    2.0
 6: 52382MCK 11708    10      1   2    2.5
 7: 52382MCK 11708     3      0   3    3.0
 8: 52382MCK 11710    30      0   1   30.0
 9: 52382MCK 11710    10      0   1   10.0
10: 52382MCK 11710    20      0   1   20.0
11: 52382MCK 11710    15      1   2   10.0
12: 52382MCK 11710     2      0   3    2.0
13: 52382MCK 11710    10      0   3   10.0
14: 52382MCK 11710     3      0   3    3.0
15: 52382MCK 11710    30      0   3   30.0
16: 52382MCK 11710    10      0   3   10.0
17: 52382MCK 11710    20      0   3   20.0
18: 52382MCK 11710    15      1   4   10.0
19: 52382MCK 11710     2      0   5    2.0
20: 52382MCK 11710    10      0   5   10.0
21: 52382MCK 11710     3      0   5    3.0
22: 52382MCK 11710    30      0   5   30.0
23: 52382MCK 11710    10      0   5   10.0
24: 52382MCK 11710    20      0   5   20.0
25: 52382MCK 11710    15      1   6   10.0
26: 52382MCK 11710     2      0   7    2.0
27: 52382MCK 11710    10      0   7   10.0
28: 52382MCK 11710     3      0   7    3.0
29: 52499MCK 11202     2      0   1    2.0
30: 52499MCK 11203     2      0   1    2.0
31: 52499MCK 11203     2      1   2    2.0
32: 52499MCK 11204     2      0   1    2.0
33: 52499MCK 11204     2      0   1    2.0
34: 52499MCK 11205     2      1   1    2.0
35: 52499MCK 11205     2      1   1    2.0
        code  item sales action cat output

mydat[output != sales]

       code  item sales action cat output
1: 52382MCK 11708    10      1   2    2.5
2: 52382MCK 11710    15      1   2   10.0
3: 52382MCK 11710    15      1   4   10.0
4: 52382MCK 11710    15      1   6   10.0