组合场景以在R中用组替换中间层

组合场景以在R中用组替换中间层,r,dplyr,data.table,lapply,R,Dplyr,Data.table,Lapply,我有数据集 mydat <- structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("52382MCK", "52499MCK"), class = "factor")

我有数据集

mydat <- 
structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("52382MCK", 
"52499MCK"), class = "factor"), item = c(11709L, 11709L, 11709L, 
11709L, 11708L, 11708L, 11708L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11202L, 11203L, 11203L, 11204L, 11204L, 11205L, 11205L
), sales = c(30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 
15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 
20L, 15L, 2L, 10L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), action = c(0L, 
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
1L, 1L)), row.names = c(NA, -35L), class = "data.frame")
# coerce to data.table
setDT(mydat)
所以有行动专栏。它只能有两个值零(0)或一(1)

场景是第一类操作之前的零类操作数和一类操作之后的零类操作数

For example
52382МСК    11709
当我们在第一类动作列之前有一个零类动作列,在第一类动作列之后有两个零。注意:当我们在第一类动作列之前有两个零类动作列,在第一类动作列之后有一个零时,可能会出现这种情况

mydat1

code    item    sales   action
52382МСК    11709   30  0
52382МСК    11709   10  1
52382МСК    11709   20  0
52382МСК    11709   15  0
为了检测这个场景,我使用这个脚本/ 这个脚本非常好用,谢谢@Uwe

library(data.table)
library(magrittr)

max_zeros <- 3
zeros <- sapply(0:max_zeros, stringr::str_dup, string = "0")
names(zeros) <- as.character(nchar(zeros))
sc <- CJ(zeros.before = zeros, zeros.after = zeros)[
  , scenario.name := paste(nchar(zeros.before), nchar(zeros.after), sep = "-")][
    , action.pattern := sprintf("%s1+(?=%s)", zeros.before, zeros.after)][]
# special case: all zero
sc0 <- data.table(
  zeros.before = NA,
  zeros.after = NA, 
  scenario.name = "no1", 
  action.pattern = "^0+$")
sc <- rbind(sc0, sc)
对于每个场景,计算零类别的中值

我需要计算中间值,按行动列按1个前零分类,即在行动列的一个类别之前,按行动列按2个前零分类,在行动列的一个类别之后。 仅对第一类操作列执行中间值替换 按销售额列。 如果中位数大于销售额,则不要替换它

要做到这一点,我使用脚本

sales_action <- function(DF, zeros_before, zeros_after) {
  library(data.table)
  library(magrittr)
  action_pattern <- 
    do.call(sprintf, 
            c(fmt = "%s1+(?=%s)", 
              stringr::str_dup("0", c(zeros_before, zeros_after)) %>% as.list()
            ))
  message("Action pattern used: ", action_pattern)
  setDT(DF)[, rn := .I]
  tmp <- DF[, paste(action, collapse = "") %>% 
              stringr::str_locate_all(action_pattern) %>% 
              as.data.table() %>% 
              lapply(function(x) rn[x]),
            by = .(code, item)][
              , end := end + zeros_after]
  DF[tmp, on = .(code, item, rn >= start, rn <= end), 
     med := as.double(median(sales[action == 0])), by = .EACHI][
       , output := as.double(sales)][action == 1, output := pmin(sales, med)][
         , c("rn", "med") := NULL][]
}
所以我得到了结果

这个问题基于以下几点 每次我都必须手动输入场景以替换为中间值

sales_action(mydat, 1L, 2L)
sales_action(mydat, 3L, 1L)
sales_action(mydat, 2L, 2L)
等等

如何为所有可能的场景自动执行替换中值 这样我就不会每次都写了 销售行动(mydat、.L、.L)

所以,输出的例子

code    i    tem    sales   action  output  pattern
52382MCK    11709   30        0       30    01+00
52382MCK    11709   10        1       10    01+00
52382MCK    11709   20        0       20    01+00
52382MCK    11709   15        0       15    01+00
52382MCK    1170    8         0        8    01+00
52382MCK    1170    10        1        8    01+00
52382MCK    1170    2         0        2    01+00
52382MCK    1170    15        0        15   01+00
如果我理解正确,OP希望通过将行动期间的
销售额
数字与销售行动前后期间的销售额中值进行比较,来分析销售行动的成功

有一些挑战:

  • 每个
    代码
    项目
    组可能有多个销售操作
  • 可用数据可能涵盖销售行动前后三天内的少于要求的3天
  • IMHO,场景的引入是处理问题2的一个迂回过程

    下面的方法

    • 标识每个
      代码
      项目
      组内的销售活动
    • 在每个销售操作之前最多拾取三行零操作行,在每个销售操作之后最多拾取三行零操作行
    • 计算这些行的销售中值,以及
    • 如果销售行动中的销售数字超过周围零行动行的中位数,则更新输出
    OP创造了“类别”一词,以区分销售行动期间(连续的
    行动==1L
    )和前后的零行动期间

    library(data.table)
    # coerce to data.table and create categories
    setDT(mydat)[, cat := rleid(action), by = .(code, item)][]
    
    # extract action categories, identify preceeding & succeeding zero action categories
    mycat <- mydat[, .(action = first(action)), by = .(code, item, cat = cat)][
      , `:=`(before = cat - 1L, after = cat + 1L)][action == 1L]
    
    mycat
    
           code  item cat action before after
    1: 52382MCK 11709   2      1      1     3
    2: 52382MCK 11708   2      1      1     3
    3: 52382MCK 11710   2      1      1     3
    4: 52382MCK 11710   4      1      3     5
    5: 52382MCK 11710   6      1      5     7
    6: 52499MCK 11203   2      1      1     3
    7: 52499MCK 11205   1      1      0     2
    
    编辑:或者,对
    pmin()
    的调用可以替换为非相等联接,该联接只更新销售额超过中位数的行:

    # prepare result, alternative approach
    mydat[, output := as.double(sales)][
      # non-equi update join
      action_cat_median, on = .(code, item, cat, output > med), output := med]
    
    
    mydat
    
    以下行已更新:

    mydat[output != sales]
    

    我正在努力从你的帖子中提取关键信息,因为这里有很多文本/代码。如果你能精简你的文章,让它更简洁,并且只保留你代码中关键的相关部分,那会有帮助。@Mauritservers,是的,有很多信息,因为这篇文章是我以前的文章的结果,但我们帮了我。从今以后,我将尽量简洁,代码非常好而且干净。一些补充。如何在单独的数据集中分离方案3L-3L?因此,它将包含两个数据集1。没有3l3l和2的所有场景。仅适用于场景3l的数据集-3l@D.Joe非常感谢。请问,您能否更详细地指定应该将哪些行移动到每个子集,特别是对于边缘情况?组的所有行
    52382MCK、11710
    都属于3L-3L场景(
    000100000000001000
    )。但是关于
    000100001000
    0000100000000
    00010000010
    的内容是什么?例如,对不起,我的internet连接有任何问题。无法上网。好了,你的问题呢。谢谢你的最深的分析技巧,所以我想,让我们来看看这些奇怪的案例,就像你发现的那样,放入新的单独的数据集中,我将在SQL中考虑如何处理这些销售。是否可以将奇怪的情况放在一个单独的数据集中?@D.Joe,我已经更新了我的答案,以展示如何将
    mydat
    拆分为每个场景的单独数据集。如果这不符合您的要求,我建议发布一个新问题。那么字符串“sales_action(mydat,1L,2L)”和字符串“split(mydat[class,on=(code,item)],by=“scenario.name”)”都可以吗。我只想做到“销售行动(mydat,1L,2L)”只适用于3-3之外的拆分数据集。就好像R是一个人的销售行动(mydat,split(mydat[class,on=(code,item)],by=“scenario.name”),其中class不是3-3)怎么做?
    code    i    tem    sales   action  output  pattern
    52382MCK    11709   30        0       30    01+00
    52382MCK    11709   10        1       10    01+00
    52382MCK    11709   20        0       20    01+00
    52382MCK    11709   15        0       15    01+00
    52382MCK    1170    8         0        8    01+00
    52382MCK    1170    10        1        8    01+00
    52382MCK    1170    2         0        2    01+00
    52382MCK    1170    15        0        15   01+00
    
    library(data.table)
    # coerce to data.table and create categories
    setDT(mydat)[, cat := rleid(action), by = .(code, item)][]
    
    # extract action categories, identify preceeding & succeeding zero action categories
    mycat <- mydat[, .(action = first(action)), by = .(code, item, cat = cat)][
      , `:=`(before = cat - 1L, after = cat + 1L)][action == 1L]
    
    mycat
    
           code  item cat action before after
    1: 52382MCK 11709   2      1      1     3
    2: 52382MCK 11708   2      1      1     3
    3: 52382MCK 11710   2      1      1     3
    4: 52382MCK 11710   4      1      3     5
    5: 52382MCK 11710   6      1      5     7
    6: 52499MCK 11203   2      1      1     3
    7: 52499MCK 11205   1      1      0     2
    
    # compute median of surrouding zero action categories
    action_cat_median <- 
      rbind(
        # get sales from up to 3 zero action rows before action category
        mydat[mycat, on = .(code, item, cat = before), 
              .(sales = tail(sales, 3), i.cat), by =.EACHI],
        # get sales from up to 3 zero action rows after action category
        mydat[mycat, on = .(code, item, cat = after), 
              .(sales = head(sales, 3), i.cat), by =.EACHI]
      )[
        # remove empty groups
        !is.na(sales)][
          # compute median for each action category
          , .(med = as.double(median(sales))), by = .(code, item, cat = i.cat)]
    
    action_cat_median
    
           code  item cat  med
    1: 52382MCK 11709   2 20.0
    2: 52382MCK 11708   2  2.5
    3: 52382MCK 11710   2 10.0
    4: 52382MCK 11710   4 10.0
    5: 52382MCK 11710   6 10.0
    6: 52499MCK 11203   2  2.0
    
    # prepare result
    mydat[, output := as.double(sales)][
      # update join
      action_cat_median, on = .(code, item, cat), output := pmin(sales, med)]
    
    # prepare result, alternative approach
    mydat[, output := as.double(sales)][
      # non-equi update join
      action_cat_median, on = .(code, item, cat, output > med), output := med]
    
    
    mydat
    
            code  item sales action cat output
     1: 52382MCK 11709    30      0   1   30.0
     2: 52382MCK 11709    10      1   2   10.0
     3: 52382MCK 11709    20      0   3   20.0
     4: 52382MCK 11709    15      0   3   15.0
     5: 52382MCK 11708     2      0   1    2.0
     6: 52382MCK 11708    10      1   2    2.5
     7: 52382MCK 11708     3      0   3    3.0
     8: 52382MCK 11710    30      0   1   30.0
     9: 52382MCK 11710    10      0   1   10.0
    10: 52382MCK 11710    20      0   1   20.0
    11: 52382MCK 11710    15      1   2   10.0
    12: 52382MCK 11710     2      0   3    2.0
    13: 52382MCK 11710    10      0   3   10.0
    14: 52382MCK 11710     3      0   3    3.0
    15: 52382MCK 11710    30      0   3   30.0
    16: 52382MCK 11710    10      0   3   10.0
    17: 52382MCK 11710    20      0   3   20.0
    18: 52382MCK 11710    15      1   4   10.0
    19: 52382MCK 11710     2      0   5    2.0
    20: 52382MCK 11710    10      0   5   10.0
    21: 52382MCK 11710     3      0   5    3.0
    22: 52382MCK 11710    30      0   5   30.0
    23: 52382MCK 11710    10      0   5   10.0
    24: 52382MCK 11710    20      0   5   20.0
    25: 52382MCK 11710    15      1   6   10.0
    26: 52382MCK 11710     2      0   7    2.0
    27: 52382MCK 11710    10      0   7   10.0
    28: 52382MCK 11710     3      0   7    3.0
    29: 52499MCK 11202     2      0   1    2.0
    30: 52499MCK 11203     2      0   1    2.0
    31: 52499MCK 11203     2      1   2    2.0
    32: 52499MCK 11204     2      0   1    2.0
    33: 52499MCK 11204     2      0   1    2.0
    34: 52499MCK 11205     2      1   1    2.0
    35: 52499MCK 11205     2      1   1    2.0
            code  item sales action cat output
    
    mydat[output != sales]
    
           code  item sales action cat output
    1: 52382MCK 11708    10      1   2    2.5
    2: 52382MCK 11710    15      1   2   10.0
    3: 52382MCK 11710    15      1   4   10.0
    4: 52382MCK 11710    15      1   6   10.0