基于r中以前出现的值进行贴现

基于r中以前出现的值进行贴现,r,data.table,grouping,discount,rowwise,R,Data.table,Grouping,Discount,Rowwise,我有以下(示例)数据集: 我的数据集很大,因此非常感谢data.table解决方案 提前感谢您的帮助。使用.I在数据集上创建一个行索引列,将'sub_class'的数据子集,其中'sub_new_I'为1,按'firm','year','subclass'获取唯一的行,然后按'firm','sub class'分组,通过将下一个“年”(lead)和“年”之间的差的负exp的lag除以5('newdf')来创建折扣_I列。然后,与“rn”上的原始数据进行联接 df[, rn := .I] newd

我有以下(示例)数据集:

我的数据集很大,因此非常感谢
data.table
解决方案


提前感谢您的帮助。

使用
.I
在数据集上创建一个行索引列,将'sub_class'的数据子集,其中'sub_new_I'为1,按'firm','year','subclass'获取
唯一的
行,然后按'firm','sub class'分组,通过将下一个“年”(
lead
)和“年”之间的差的负
exp
lag
除以5('newdf')来创建折扣_I列。然后,与“rn”上的原始数据进行联接

df[, rn := .I]
newdf <- unique(df[df[, sub_class %in% sub_class[sub_new_I == 1]]],
     by = c('firm', 'year', 'sub_class'))[,  discounted_I := 
         shift(exp(-(shift(year, type = 'lead') - 
           year)/5)), .(firm, sub_class)]
df[newdf, discounted_I := discounted_I, on = .(rn)]

另一个简单的想法

  • 按公司和子类分组,计算与以前外观的年度差异
  • 使用
    fifelse
  • 删除临时
    diff.year
  • > df
        firm year patent_number class sub_class sub_new_I discounted_I
     1:    A 1994       5505081    73       147         0       0.0000
     2:    A 1994       5505081    73     12.07         0       0.0000
     3:    A 1994       5606110    73     12.08         0       0.0000
     4:    A 1994       5606110    73       147         0       0.0000
     5:    A 1994       5837890    73    116.03         0       0.0000
     6:    A 1994       5837890    73       184         0       0.0000
     7:    A 1994       5837890    73       185         0       0.0000
     8:    A 1994       5837890    73       186         0       0.0000
     9:    A 1994       5837890    73       198         0       0.0000
    10:    A 1994       5837890    73       210         0       0.0000
    11:    A 2000       6725912   165       144         0       0.0000
    12:    A 2000       6725912   165       147         1       0.3012
    13:    A 2000       6725912   165       140         0       0.0000
    14:    A 2002       6748800    73       147         1       0.6703
    15:    A 2002       6748800    73   E29.272         0       0.0000
    16:    A 2002       6748800    73   E29.309         0       0.0000
    17:    A 2003       7153136   434        59         0       0.0000
    18:    A 2003       6997049    73       147         1       0.8187
    19:    A 2003       6997049    73   E29.272         1       0.8187
    20:    A 2003       6997049    73   E29.309         1       0.8187
    21:    B 1975       4026555   463         3         0       0.0000
    22:    B 1975       4026555   463       168         0       0.0000
    23:    B 1975       4026555   463       473         0       0.0000
    24:    B 1975       4026555   463       960         0       0.0000
    25:    B 1975       4026555   463       552         0       0.0000
    26:    B 1975       4026555   463        31         0       0.0000
    27:    B 1976       4155095   348       701         0       0.0000
    28:    B 1976       4155095   348       593         0       0.0000
    29:    B 1977       4137556   361      91.2         0       0.0000
    30:    B 1977       4137556   361        72         0       0.0000
    31:    B 1977       4137556   361        58         0       0.0000
    32:    B 1977       4137556   361        59         0       0.0000
    33:    B 1977       4137556   361       111         0       0.0000
    34:    B 1977       4137556   361       222         0       0.0000
    35:    B 1977       4137556   361     93.05         0       0.0000
    36:    B 1977       4137556   361       117         0       0.0000
    37:    B 1977       4137556   361       709         0       0.0000
    38:    B 1978       4253157   707     104.1         0       0.0000
    39:    B 1978       4253157   707     93.25         0       0.0000
    40:    B 1978       4253157   707       552         1       0.5488
    
    df[, rn := .I]
    newdf <- unique(df[df[, sub_class %in% sub_class[sub_new_I == 1]]],
         by = c('firm', 'year', 'sub_class'))[,  discounted_I := 
             shift(exp(-(shift(year, type = 'lead') - 
               year)/5)), .(firm, sub_class)]
    df[newdf, discounted_I := discounted_I, on = .(rn)]
    
       firm year patent_number class sub_class sub_new_I rn discounted_I
     1:    A 1994       5505081    73       147         0  1           NA
     2:    A 1994       5505081    73     12.07         0  2           NA
     3:    A 1994       5606110    73     12.08         0  3           NA
     4:    A 1994       5606110    73       147         0  4           NA
     5:    A 1994       5837890    73    116.03         0  5           NA
     6:    A 1994       5837890    73       184         0  6           NA
     7:    A 1994       5837890    73       185         0  7           NA
     8:    A 1994       5837890    73       186         0  8           NA
     9:    A 1994       5837890    73       198         0  9           NA
    10:    A 1994       5837890    73       210         0 10           NA
    11:    A 2000       6725912   165       144         0 11           NA
    12:    A 2000       6725912   165       147         1 12    0.3011942
    13:    A 2000       6725912   165       140         0 13           NA
    14:    A 2002       6748800    73       147         1 14    0.6703200
    15:    A 2002       6748800    73   E29.272         0 15           NA
    16:    A 2002       6748800    73   E29.309         0 16           NA
    17:    A 2003       7153136   434        59         0 17           NA
    18:    A 2003       6997049    73       147         1 18    0.8187308
    19:    A 2003       6997049    73   E29.272         1 19    0.8187308
    20:    A 2003       6997049    73   E29.309         1 20    0.8187308
    21:    B 1975       4026555   463         3         0 21           NA
    22:    B 1975       4026555   463       168         0 22           NA
    23:    B 1975       4026555   463       473         0 23           NA
    24:    B 1975       4026555   463       960         0 24           NA
    25:    B 1975       4026555   463       552         0 25           NA
    26:    B 1975       4026555   463        31         0 26           NA
    27:    B 1976       4155095   348       701         0 27           NA
    28:    B 1976       4155095   348       593         0 28           NA
    29:    B 1977       4137556   361      91.2         0 29           NA
    30:    B 1977       4137556   361        72         0 30           NA
    31:    B 1977       4137556   361        58         0 31           NA
    32:    B 1977       4137556   361        59         1 32           NA
    33:    B 1977       4137556   361       111         0 33           NA
    34:    B 1977       4137556   361       222         0 34           NA
    35:    B 1977       4137556   361     93.05         0 35           NA
    36:    B 1977       4137556   361       117         0 36           NA
    37:    B 1977       4137556   361       709         0 37           NA
    38:    B 1978       4253157   707     104.1         0 38           NA
    39:    B 1978       4253157   707     93.25         0 39           NA
    40:    B 1978       4253157   707       552         1 40    0.5488116
    
    df[,diff.year := year - lag(year),by=.(firm,sub_class)]
    df[, discounted_I := fifelse(sub_new_I == 1,exp(-diff.year/5),0),
       by=.(firm,sub_class)]
    df[,diff.year:=NULL]