Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 匹配数据帧,不包括最后一个非NA值和忽略顺序_R_Join_Data.table_Dplyr_Melt - Fatal编程技术网

R 匹配数据帧,不包括最后一个非NA值和忽略顺序

R 匹配数据帧,不包括最后一个非NA值和忽略顺序,r,join,data.table,dplyr,melt,R,Join,Data.table,Dplyr,Melt,我有两个数据帧: Partner<-c("Alpha","Beta","Zeta") COL1<-c("A","C","M") COL2<-c("B","D","K") COL3<-c("C","F",NA) COL4<-c("D",NA,NA) df1<-data.frame(Partner,COL1,COL2,COL3,COL4) lift<-c(9,10,11,12,12,23,12,24) RULE1<-c("B","B","D","A"

我有两个数据帧:

Partner<-c("Alpha","Beta","Zeta")
COL1<-c("A","C","M")
COL2<-c("B","D","K")
COL3<-c("C","F",NA)
COL4<-c("D",NA,NA)
df1<-data.frame(Partner,COL1,COL2,COL3,COL4)

lift<-c(9,10,11,12,12,23,12,24)
RULE1<-c("B","B","D","A","C","K","M","K")
RULE2<-c("A","A","C","B","A","M","T","M")
RULE3<-c("G","D","M","C" ,"M", "E",NA,NA)
RULE4<-c(NA,NA,"K","D" ,NA, NA,NA,NA)

df2<-data.frame(lift,RULE1,RULE2,RULE3,RULE4)

df1
Partner    COL1    COL2    COL3    COL4
Alpha         A       B       C       D
Beta          C       D       F      NA
Zeta          M       K      NA      NA

df2
lift    RULE1    RULE2     RULE3    RULE4
   9        B        A         G       NA
  10        B        A         D       NA
  11        D        C         M        K
  12        A        B         C        D
  12        C        A         M       NA
  23        K        M         E       NA
  12        M        T        NA       NA
  24        K        M        NA       NA
为清晰起见,请写出结果:

df3

第1行输出“否”,因为在Alpha Partner中找不到G,并且G之前的所有值都显示在Alpha Partner(B,A)中

第2行输出“是”,因为D出现在Alpha Partner中,D之前的所有值都出现在Alpha Partner(B,A)中

第3行输出“是”,因为D出现在Alpha Partner中,D之前的所有值都出现在Alpha Partner中(A、B、C)

第4行输出“否”,因为在Alpha Partner中找不到M,并且M之前的所有值都显示在Alpha Partner(C,A)中

第5行输出“否”,因为在Zeta Partner中找不到E,并且E之前的所有值都显示在Zeta Partner中(K,M)

第6行输出“否”,因为在Zeta Partner中找不到T,并且T之前的所有值都显示在Zeta Partner(M)中

第7行输出“是”,因为M出现在Zeta Partner中,并且M之前的所有值都出现在Zeta Partner(K)中

我认为这必须是一个加入或某种匹配,但不知道如何做到这一点

如果有人能帮我解决这个问题,这将是非常有帮助的

谢谢

这是一次尝试:

df1<-cbind(df1_id=1:nrow(df1),df1)
df2 <- cbind(df2_id=1:nrow(df2),df2)
d11  <- df1 %>% gather(Col, Value,starts_with("C"))           #Long
d11 <- d11 %>% na.omit() %>%group_by(df1_id) %>% slice(-n()) #remove last non NA
d22  <- df2 %>%  gather(Rule, Value,starts_with("R"))
res <- inner_join(d11,d22)
rm(d22)
rm(d11)
final<-cbind(df1[res$df1_id,],df2[res$df2_id,])
final$Exist <- apply(final, 1, FUN = function(x) 
c("No", "Yes")[(anyDuplicated(x[!is.na(x) & x != "" ])!=0) +1])
df1%slice(-n())#删除最后一个非NA
d22%聚集(规则、值,以“R”开头)

res这相当棘手,因为n个客户的购买必须与一组m规则进行比较。除此之外,还有两点增加了复杂性:

  • df2
    中最后一个非NA
    RULE
    列在语义上与其他列不同。不幸的是,给定的数据结构没有反映这一点。因此,
    df2
    缺少一个明确建议的列

  • 最后,必须确定合作伙伴是否已经购买了推荐的商品

  • 出于性能原因,以下方法依赖于
    melt()
    dcast()
    data.table
    包的联接操作。但是,为了避免创建n*m行的笛卡尔叉积,使用了循环

    EDIT已将
    dcast()
    移出
    lappy()
    函数

    为n:m联接准备数据
    一些带有
    dput
    的示例数据会很有帮助。@dash2刚开始提供了数据结构,谢谢!我不明白为什么去掉了11行的“lift”,因为D、C、M不都在df1行中。逻辑是,如果在df1中找到最后一个非na值之前df2中的所有内容,那么它就是一个匹配项。在本例中,最后一个非na值是K。这就是我在上一篇文章中思考的原因,我需要删除最后一个非na值,进行匹配,然后稍后使用与其关联的索引id将其cbind回来@akrunI在这里也问了一个更简单的问题:
    df1<-cbind(df1_id=1:nrow(df1),df1)
    df2 <- cbind(df2_id=1:nrow(df2),df2)
    d11  <- df1 %>% gather(Col, Value,starts_with("C"))           #Long
    d11 <- d11 %>% na.omit() %>%group_by(df1_id) %>% slice(-n()) #remove last non NA
    d22  <- df2 %>%  gather(Rule, Value,starts_with("R"))
    res <- inner_join(d11,d22)
    rm(d22)
    rm(d11)
    final<-cbind(df1[res$df1_id,],df2[res$df2_id,])
    final$Exist <- apply(final, 1, FUN = function(x) 
    c("No", "Yes")[(anyDuplicated(x[!is.na(x) & x != "" ])!=0) +1])
    
    library(data.table)
    # convert to data.table and add row numbers
    # here, a copy is used insteasd of setDT() in order to rename the data.tables
    purchases <- as.data.table(df1)[, rnp := seq_len(.N)]
    rules <- as.data.table(df2)[, rnr := seq_len(.N)]
    
    # prepare purchases for joins
    lp <- melt(purchases, id.vars = c("rnp", "Partner"), na.rm = TRUE)
    wp <- dcast(lp, rnp ~ value, drop = FALSE)
    wp
    #   rnp  A  B  C  D  F  K  M
    #1:   1  A  B  C  D NA NA NA
    #2:   2 NA NA  C  D  F NA NA
    #3:   3 NA NA NA NA NA  K  M
    
    
    # prepare rules
    lr <- melt(rules, id.vars = c("rnr", "lift"), na.rm = TRUE)
    # identify last column of each rule which becomes the recommendation
    rn_of_last_col <- lr[, last(.I), by = rnr][, V1]
    # reshape from long to wide without recommendation
    wr <- dcast(lr[-rn_of_last_col], rnr ~ value)
    # add column with recommendations (kind of cbind, no join)
    wr[, recommended := lr[rn_of_last_col, value]]
    wr
    #   rnr  A  B  C  D  K  M recommended
    #1:   1  A  B NA NA NA NA           G
    #2:   2  A  B NA NA NA NA           D
    #3:   3 NA NA  C  D NA  M           K
    #4:   4  A  B  C NA NA NA           D
    #5:   5  A NA  C NA NA NA           M
    #6:   6 NA NA NA NA  K  M           E
    #7:   7 NA NA NA NA NA  M           T
    #8:   8 NA NA NA NA  K NA           M
    
    combi <- rbindlist(
      # implied loop over rules to find matching purchases for each rule
      lapply(seq_len(nrow(rules)), function(i) {
        # get col names except last col which is the recommendation
        cols <- lr[rnr == i, value[-.N]]
        # join single rule with all partners on relevant cols for this rule
        wp[wr[i, .SD, .SDcols = c(cols, "rnr", "recommended")], on = cols, nomatch = 0]
      })
    )
    # check if recommendation was purchased already
    combi[, already_purchased := Reduce(`|`, lapply(.SD, function(x) x == recommended)), 
          .SDcols = -c("rnp", "rnr", "recommended")]
    # clean up already purchased
    combi[is.na(already_purchased), already_purchased := FALSE
          ][, already_purchased := ifelse(already_purchased, "Yes", "No")]
    combi
    #   rnp  A  B  C  D  F  K  M rnr recommended already_purchased
    #1:   1  A  B  C  D NA NA NA   1           G                No
    #2:   1  A  B  C  D NA NA NA   2           D               Yes
    #3:   1  A  B  C  D NA NA NA   4           D               Yes
    #4:   1  A  B  C  D NA NA NA   5           M                No
    #5:   3 NA NA NA NA NA  K  M   6           E                No
    #6:   3 NA NA NA NA NA  K  M   7           T                No
    #7:   3 NA NA NA NA NA  K  M   8           M               Yes
    
    tmp_rules <- rules[combi[, .(rnp, rnr, recommended, already_purchased)], on = "rnr"]
    tmp_purch <- purchases[combi[, .(rnp, rnr)], on = "rnp"]
    result <- tmp_purch[tmp_rules, on = c("rnp", "rnr")]
    result[, (c("rnp", "rnr")) := NULL]
    result
    #   Partner COL1 COL2 COL3 COL4 lift RULE1 RULE2 RULE3 RULE4 recommend already_purchased
    #1:   Alpha    A    B    C    D    9     B     A     G    NA         G                No
    #2:   Alpha    A    B    C    D   10     B     A     D    NA         D               Yes
    #3:   Alpha    A    B    C    D   12     A     B     C     D         D               Yes
    #4:   Alpha    A    B    C    D   12     C     A     M    NA         M                No
    #5:    Zeta    M    K   NA   NA   23     K     M     E    NA         E                No
    #6:    Zeta    M    K   NA   NA   12     M     T    NA    NA         T                No
    #7:    Zeta    M    K   NA   NA   24     K     M    NA    NA         M               Yes