Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
按列分组的数据帧上R中的行之间的差异_R_Dataframe_Diff_Lag - Fatal编程技术网

按列分组的数据帧上R中的行之间的差异

按列分组的数据帧上R中的行之间的差异,r,dataframe,diff,lag,R,Dataframe,Diff,Lag,我希望通过应用程序名称获得不同版本的计数差异。我的数据集如下所示:应用程序名称、版本id、计数[差异] 这是数据集 data = structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), cou

我希望通过应用程序名称获得不同版本的计数差异。我的数据集如下所示:应用程序名称、版本id、计数[差异]

这是数据集

    data = structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L)), .Names = c("app_name", "version_id", 
"count"), class = "data.frame", row.names = c(NA, -9L))
鉴于此data.frame,如何通过应用程序名称和版本id获得计数的滞后差异?每个应用程序的初始(第一个)版本差异为零,因为没有差异

下面是最后一个“diff”列的最终结果的示例

structure(list(app_name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), version_id = c(1, 
1.1, 2.3, 2, 3.1, 3.3, 4, 1.1, 2.4), count = c(600L, 620L, 620L, 
200L, 200L, 250L, 250L, 15L, 36L), diff = c(0, 20, 0, 0, 0, 1.25, 
0, 0, 2.4)), .Names = c("app_name", "version_id", "count", "diff"
), class = "data.frame", row.names = c(NA, -9L))

尝试使用
dplyr
lag

library(dplyr)
data %>% group_by(app_name) %>%
         mutate(diffvers = version_id - dplyr::lag(version_id, default = version_id[1]),
                diffcount = count - dplyr::lag(count, default = count[1]))

Source: local data frame [9 x 5]
Groups: app_name [3]

  app_name version_id count diffvers diffcount
    (fctr)      (dbl) (int)    (dbl)     (int)
1        a        1.0   600      0.0         0
2        a        1.1   620      0.1        20
3        a        2.3   620      1.2         0
4        b        2.0   200      0.0         0
5        b        3.1   200      1.1         0
6        b        3.3   250      0.2        50
7        b        4.0   250      0.7         0
8        c        1.1    15      0.0         0
9        c        2.4    36      1.3        21

我们可以使用
data.table
。我们将'data.frame'转换为'data.table'(
setDT(data)
),按'app_name'分组,循环(
lappy(…
)在
.SDcols
中指定的列,获得当前元素与其
滞后
之间的差异(
shift
默认情况下具有
type='lag'
)并赋值(
)用于创建新列的输出

library(data.table)#v1.9.6
setDT(data)[, c('diffvers', 'diffcount') := lapply(.SD, 
              function(x) x-shift(x, fill=x[1L])), by = app_name, .SDcols=2:3]

data
#   app_name version_id count diffvers diffcount
#1:        a        1.0   600      0.0         0
#2:        a        1.1   620      0.1        20
#3:        a        2.3   620      1.2         0
#4:        b        2.0   200      0.0         0
#5:        b        3.1   200      1.1         0
#6:        b        3.3   250      0.2        50
#7:        b        4.0   250      0.7         0
#8:        c        1.1    15      0.0         0
#9:        c        2.4    36      1.3        21

到目前为止你尝试了什么?@Pascal我一直尝试使用mutate()但没有效果。遵循以下线程: