基于R中的列值计算和存储数据帧中行之间差异的结果

基于R中的列值计算和存储数据帧中行之间差异的结果,r,dataframe,R,Dataframe,我是新手,学习基础知识。我在R中有一个数据帧,其中包含像controller_id、user_id、mth_id、col_val1到col_val100这样的列 df <- data.frame('controller_id' = c('X','X','X','X','X','X','Y','Y','Y','Y','Y','Y','Z','Z'), 'user_id'=c('A','B','C','A','B','C','P','Q','R','P','Q','R',NA,NA), 'mt

我是新手,学习基础知识。我在R中有一个数据帧,其中包含像controller_id、user_id、mth_id、col_val1到col_val100这样的列

df <- data.frame('controller_id' = c('X','X','X','X','X','X','Y','Y','Y','Y','Y','Y','Z','Z'),
'user_id'=c('A','B','C','A','B','C','P','Q','R','P','Q','R',NA,NA),
'mth_id'=c('1393','1393','1393','1398','1398','1398','1393','1393','1393','1398','1398','1398','1393','1398'),
'col_val1' = c(5,4,6,3,1,10,12,15,18,13,19,1,5,2),
'col_val2'=c(8,12,9,2,12,5,7,9,11,4,0,7,10,5))

> df
   controller_id user_id mth_id col_val1 col_val2
1              X       A   1393        5        8
2              X       B   1393        4       12
3              X       C   1393        6        9
4              X       A   1398        3        2
5              X       B   1398        1       12
6              X       C   1398       10        5
7              Y       P   1393       12        7
8              Y       Q   1393       15        9
9              Y       R   1393       18       11
10             Y       P   1398       13        4
11             Y       Q   1398       19        0
12             Y       R   1398        1        7
13             Z    <NA>   1393        5       10
14             Z    <NA>   1398        2        5
如果给定控制器id没有关联的用户id,则应计算控制器id本身之间的列值差

理想情况下,我希望将这些输出存储在列表/数据框中,以供以后使用。 此外,该代码将针对数据框中存在的约900列运行


任何帮助都将不胜感激。

考虑使用运行组和的基本R解决方案。要遍历所有列,请使用
sapply()
传入列名:

rowdiff <- function(col) {
             sapply(1:nrow(df),
               function(i){
                 # CONDITIONAL TO RETURN NA FOR FIRST VAL IN EACH USER ID
                 ifelse(sum(df[1:i, c("user_id")] == df$user_id[i]) == 1, NA,
                    # DIFFERENCE OF CURRENT LOOP COL VALUE - LAST COL VALUE OF USER ID GROUP
                    df[[col]][i] -
                    sum((df[1:i-1, c("user_id")] == df$user_id[i]) 
                    * df[1:i-1,][[col]]))
               })
           }


finaldf <- cbind(df, data.frame(sapply(names(df[c(3:ncol(df))]), rowdiff)))

#   user_id mth_id col_val1 col_val2 col_val3 col_val1 col_val2 col_val3
# 1       A   1398        4        2       12       NA       NA       NA
# 2       B   1398        3        3       30       NA       NA       NA
# 3       C   1398        1        1       14       NA       NA       NA
# 4       A   1393        5        7        7        1        5       -5
# 5       B   1393        2        6       18       -1        3      -12
# 6       C   1393        7        0        9        6       -1       -5
# 7       D   1398        4        5       12       NA       NA       NA
# 8       D   1393        0        3       24       -4       -2       12

rowdiff感谢@Parfait的回复。这很有帮助。然而,我的问题陈述得到了轻微的修改。我还有一个层次可供选择。你能建议一些可以解决这个问题的改变吗?嘿@Parfait,如果价值没有变化怎么办。因为我们的增量不是增加就是减少,如果出现一个条件,比如值没有变化,我想把它打印出来。任何关于如何做到这一点的建议。请参阅更新,处理分层
控制器id
和不增加场景。请务必按
mth\u id
顺序订购数据帧,以便1393先于1398。@akrun您有什么建议吗?
rowdiff <- function(col) {
             sapply(1:nrow(df),
               function(i){
                 # CONDITIONAL TO RETURN NA FOR FIRST VAL IN EACH USER ID
                 ifelse(sum(df[1:i, c("user_id")] == df$user_id[i]) == 1, NA,
                    # DIFFERENCE OF CURRENT LOOP COL VALUE - LAST COL VALUE OF USER ID GROUP
                    df[[col]][i] -
                    sum((df[1:i-1, c("user_id")] == df$user_id[i]) 
                    * df[1:i-1,][[col]]))
               })
           }


finaldf <- cbind(df, data.frame(sapply(names(df[c(3:ncol(df))]), rowdiff)))

#   user_id mth_id col_val1 col_val2 col_val3 col_val1 col_val2 col_val3
# 1       A   1398        4        2       12       NA       NA       NA
# 2       B   1398        3        3       30       NA       NA       NA
# 3       C   1398        1        1       14       NA       NA       NA
# 4       A   1393        5        7        7        1        5       -5
# 5       B   1393        2        6       18       -1        3      -12
# 6       C   1393        7        0        9        6       -1       -5
# 7       D   1398        4        5       12       NA       NA       NA
# 8       D   1393        0        3       24       -4       -2       12
statements <- function(col) {
  sapply(1:nrow(df),
         function(i){

           delta <- df[[col]][i]-
                      sum((df[1:i-1, c("controller_id")] == df$controller_id[i])
                         *(df[1:i-1, c("user_id")] == df$user_id[i]) 
                         * df[1:i-1,][[col]])

           changeword <- ifelse(delta < 0, "decreased", 
                                ifelse(delta > 0, "increased", "not changed"))

           ifelse(sum(df[1:i, c("user_id")] == df$user_id[i]) == 1, NA,
                  paste0(col, " for controller_id '", df$controller_id[i], "', user_id '", 
                         df$user_id[i], "' has ", changeword, " from ",
                         sum((df[1:i-1, c("controller_id")] == df$controller_id[i])
                             * (df[1:i-1, c("user_id")] == df$user_id[i]) 
                             * df[1:i-1,][[col]]), " to ",
                         df[[col]][i])

           )
         })
}
finaldf <- cbind(df, data.frame(sapply(names(df[c(4:ncol(df))]), statements)))
                                                                  col_val1
1                                                                     <NA>
2                                                                     <NA>
3                                                                     <NA>
4    col_val1 for controller_id 'X', user_id 'A' has decreased from 5 to 3
5    col_val1 for controller_id 'X', user_id 'B' has decreased from 4 to 1
6   col_val1 for controller_id 'X', user_id 'C' has increased from 6 to 10
7                                                                     <NA>
8                                                                     <NA>
9                                                                     <NA>
10 col_val1 for controller_id 'Y', user_id 'P' has increased from 12 to 13
11 col_val1 for controller_id 'Y', user_id 'Q' has increased from 15 to 19
12  col_val1 for controller_id 'Y', user_id 'R' has decreased from 18 to 1
13                                                                    <NA>
14                                                                    <NA>
                                                                    col_val2
1                                                                       <NA>
2                                                                       <NA>
3                                                                       <NA>
4      col_val2 for controller_id 'X', user_id 'A' has decreased from 8 to 2
5  col_val2 for controller_id 'X', user_id 'B' has not changed from 12 to 12
6      col_val2 for controller_id 'X', user_id 'C' has decreased from 9 to 5
7                                                                       <NA>
8                                                                       <NA>
9                                                                       <NA>
10     col_val2 for controller_id 'Y', user_id 'P' has decreased from 7 to 4
11     col_val2 for controller_id 'Y', user_id 'Q' has decreased from 9 to 0
12    col_val2 for controller_id 'Y', user_id 'R' has decreased from 11 to 7
13                                                                      <NA>
14                                                                      <NA>