R 使用条件从数据帧查找值

R 使用条件从数据帧查找值,r,matrix,dplyr,R,Matrix,Dplyr,我有两个数据帧,如下所述: DF_1> Sr.No. Stage Time Result Result_2 1 updated_date 1516868822411 1516868822361 1516868822350 2 id 1516868822411 ABC - 3 engine_dat

我有两个数据帧,如下所述:

DF_1>
    Sr.No.  Stage           Time            Result          Result_2
    1       updated_date    1516868822411   1516868822361   1516868822350
    2       id              1516868822411   ABC             -
    3       engine_date     1516868822411   1516868822000   -
    4       blocked         1516868822411   80000           0
    5       updated_date    1516868822398   1516868822350   1516866877815
    6       list            1516868822398   BCD             -
    7       sub_stat_1      1516868779095   AC-12           AC-14
    8       status_1        1516868642468   AC-25           AC-38

DF_2>

Sr. No.     ID        Type_1 Type_2
1           AC-12      X      Y
2           AC-14      XX     YY
3           AC-25      A      B
4           AC-38      CC     CD
现在,我想用下面提到的条件从DF_2中得到vlookup值:

  • 如果阶段为sub_stat_1,则vlookup结果和类型_2(来自DF_2)的结果_2
  • 如果阶段为状态_1,则vlookup结果和类型_1(来自DF_2)的结果_2
  • 若stage为status_1或sub_stat_1,但Result或Result_2为nothing,则在输出数据帧中给出“-”值
  • 将与DF_1 Result和Result_2相同的其他值分别保留到所需的输出列Final_1和Final_2
  • 只要时间中有历元时间,Result和Result_2列(如果可能)将其分别转换为所需输出列time_2、Final_1和Final_2中的正常时间
  • 所需输出数据帧:

    Sr. No.    Stage   Time            Result          Result_2      Time_2                     Final_1                     Final_2
    1      updated_date 1516868822411  1516868822361   1516868822350 25/01/2018 08:27:02        25/01/2018 08:27:02         25/01/2018 08:27:02
    2      id           1516868822411  ABC             -             25/01/2018 08:27:02        ABC                         -
    3      engine_date  1516868822411  1516868822000   -             25/01/2018 08:27:02        25/01/2018 08:27:02         -
    4      blocked      1516868822411  80000           0             25/01/2018 08:27:02        80000                       0
    5      updated_date 1516868822398  1516868822350   1516866877815 25/01/2018 08:27:02        25/01/2018 08:27:02         25/01/2018 07:54:38
    6      list         1516868822398  BCD             -             25/01/2018 08:27:02        BCD                         -
    7      sub_stat_1   1516868779095  AC-12           AC-14         25/01/2018 08:26:19        Y (Output of AC-12)         YY (Output of AC-14)
    8      status_1     1516868642468  AC-25           AC-38         25/01/2018 08:24:02        A (Output of AC-25)         CC (Output of AC-38)
    

    我假设您的数据帧列是打印的,字符串列是
    字符
    类型,而不是
    因子
    。如果尚未将它们转换为
    字符
    。(请参见底部的示例数据。)

    如果阶段为sub_stat_1,则类型_2(来自DF_2)的vlookup结果和结果_2

    如果阶段为状态_1,则vlookup结果和类型_1(来自DF_2)的结果_2

    若stage为status_1或sub_stat_1,但Result或Result_2为nothing,则在输出数据帧中给出“-”值

    I我们使用缺少的值初始化,
    NA
    。我鼓励您这样做,但是如果您真的愿意,您可以做
    dfu 1[is.na(DF_1)]=“-”

    将与DF_1 Result和Result_2相同的其他值分别保留到所需的输出列Final_1和Final_2

    只要时间中有历元时间,Result和Result_2列(如果可能)将其分别转换为所需输出列time_2、Final_1和Final_2中的正常时间

    我把这个留给你-如果你提供原点,你可以在你的大纪元上使用
    as.POSIXct()
    ,但是你的整数对我来说太大了。在将它们插入最后一列之前,您可能需要对它们进行
    格式化
    ,以便可以控制它们转换为字符时的外观。如果你需要更多的帮助,可以问一个单独的问题

    DF_1
    #   Sr.No.        Stage         Time        Result      Result_2       Final_1       Final_2
    # 1      1 updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
    # 2      2           id 1.516869e+12           ABC             -           ABC             -
    # 3      3  engine_date 1.516869e+12 1516868822000             - 1516868822000             -
    # 4      4      blocked 1.516869e+12         80000             0         80000             0
    # 5      5 updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
    # 6      6         list 1.516869e+12           BCD             -           BCD             -
    # 7      7   sub_stat_1 1.516869e+12         AC-12         AC-14             Y            YY
    # 8      8     status_1 1.516869e+12         AC-25         AC-38             A            CC
    

    使用这些数据:

    DF_1
    #   Sr.No.  Stage         Time        Result      Result_2       Final_1       Final_2
    # 1  updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
    # 2            id 1.516869e+12           ABC             -           ABC             -
    # 3   engine_date 1.516869e+12 1516868822000             - 1516868822000             -
    # 4       blocked 1.516869e+12         80000             0         80000             0
    # 5  updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
    # 6          list 1.516869e+12           BCD             -           BCD             -
    # 7    sub_stat_1 1.516869e+12         AC-12         AC-14             Y            YY
    # 8      status_1 1.516869e+12         AC-25         AC-38             A            CC
    
    DF_1 = read.table(text = "Sr.No.  Stage       Time            Result      
    
        Result_2
        1   updated_date    1516868822411   1516868822361   1516868822350
        2   id              1516868822411   ABC             -
        3   engine_date     1516868822411   1516868822000   -
        4   blocked         1516868822411   80000           0
        5   updated_date    1516868822398   1516868822350   1516866877815
        6   list            1516868822398   BCD             -
        7   sub_stat_1      1516868779095   AC-12           AC-14
        8   status_1        1516868642468   AC-25           AC-38", check.names = F, stringsAsFactors = FALSE, header = T)
    
    DF_2 = read.table(text = "Sr.No. ID     Type_1 Type_2
    1   AC-12      X      Y
    2   AC-14      XX     YY
    3   AC-25      A      B
    4   AC-38      CC     CD", check.names = F, stringsAsFactors = FALSE, header = T)
    

    使用这些数据:

    DF_1
    #   Sr.No.  Stage         Time        Result      Result_2       Final_1       Final_2
    # 1  updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
    # 2            id 1.516869e+12           ABC             -           ABC             -
    # 3   engine_date 1.516869e+12 1516868822000             - 1516868822000             -
    # 4       blocked 1.516869e+12         80000             0         80000             0
    # 5  updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
    # 6          list 1.516869e+12           BCD             -           BCD             -
    # 7    sub_stat_1 1.516869e+12         AC-12         AC-14             Y            YY
    # 8      status_1 1.516869e+12         AC-25         AC-38             A            CC
    
    DF_1 = read.table(text = "Sr.No.  Stage       Time            Result      
    
        Result_2
        1   updated_date    1516868822411   1516868822361   1516868822350
        2   id              1516868822411   ABC             -
        3   engine_date     1516868822411   1516868822000   -
        4   blocked         1516868822411   80000           0
        5   updated_date    1516868822398   1516868822350   1516866877815
        6   list            1516868822398   BCD             -
        7   sub_stat_1      1516868779095   AC-12           AC-14
        8   status_1        1516868642468   AC-25           AC-38", check.names = F, stringsAsFactors = FALSE, header = T)
    
    DF_2 = read.table(text = "Sr.No. ID     Type_1 Type_2
    1   AC-12      X      Y
    2   AC-14      XX     YY
    3   AC-25      A      B
    4   AC-38      CC     CD", check.names = F, stringsAsFactors = FALSE, header = T)
    

    “比vlookup结果和类型_2的结果_2”vlookup是excel函数(对吗?)。它是R中的合并。如果结果和结果2在同一个数据框中,你如何合并它们?@Mislav是的,这是一个excel函数,但用vlookup的意思是,我想要在R中使用该函数以获得所需的输出。你可以使用
    ?match
    @Gregor你能解释一下如何使用满足我所有条件的if-else条件来编写它吗如上所述。@Mislav我不必合并,因为对于每个id,类型_1和类型_2中都会有静态值,我只想在DF_1 Time_2、Final_1和Final_2中添加三列,其中Time_2给出DF_1的历元时间的标准格式,Final_1中相同,Final_2将具有与Result和Result_2相同的值,除了(大纪元时间,status_1和sub_stat_1)其中status_1和sub_stat_1值根据id来自DF_2。为什么使用Sr.No.?因为您的问题使其看起来像列名的一部分。这应该很容易让您适应实际数据。使用
    dput()
    如果您希望答案具有完全相同的输入,则提供可复制/粘贴的数据。很抱歉,这是我的错误,现在我已修改了答案。它将为您提供数据帧的确切概念。尽管收到此错误,但仍能正常工作:警告消息:In
    [在sub_stat_1和status_1不可用的情况下获取此错误,并在Final_1和Final_2中给出一些随机值(即Final_1=1516868822361和Final_2=1516868822350)
    
    DF_1
    #   Sr.No.  Stage         Time        Result      Result_2       Final_1       Final_2
    # 1  updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
    # 2            id 1.516869e+12           ABC             -           ABC             -
    # 3   engine_date 1.516869e+12 1516868822000             - 1516868822000             -
    # 4       blocked 1.516869e+12         80000             0         80000             0
    # 5  updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
    # 6          list 1.516869e+12           BCD             -           BCD             -
    # 7    sub_stat_1 1.516869e+12         AC-12         AC-14             Y            YY
    # 8      status_1 1.516869e+12         AC-25         AC-38             A            CC
    
    DF_1 = read.table(text = "Sr.No.  Stage       Time            Result      
    
        Result_2
        1   updated_date    1516868822411   1516868822361   1516868822350
        2   id              1516868822411   ABC             -
        3   engine_date     1516868822411   1516868822000   -
        4   blocked         1516868822411   80000           0
        5   updated_date    1516868822398   1516868822350   1516866877815
        6   list            1516868822398   BCD             -
        7   sub_stat_1      1516868779095   AC-12           AC-14
        8   status_1        1516868642468   AC-25           AC-38", check.names = F, stringsAsFactors = FALSE, header = T)
    
    DF_2 = read.table(text = "Sr.No. ID     Type_1 Type_2
    1   AC-12      X      Y
    2   AC-14      XX     YY
    3   AC-25      A      B
    4   AC-38      CC     CD", check.names = F, stringsAsFactors = FALSE, header = T)