基于R中的列值计算和存储数据帧中行之间差异的结果
我是新手,学习基础知识。我在R中有一个数据帧,其中包含像controller_id、user_id、mth_id、col_val1到col_val100这样的列基于R中的列值计算和存储数据帧中行之间差异的结果,r,dataframe,R,Dataframe,我是新手,学习基础知识。我在R中有一个数据帧,其中包含像controller_id、user_id、mth_id、col_val1到col_val100这样的列 df <- data.frame('controller_id' = c('X','X','X','X','X','X','Y','Y','Y','Y','Y','Y','Z','Z'), 'user_id'=c('A','B','C','A','B','C','P','Q','R','P','Q','R',NA,NA), 'mt
df <- data.frame('controller_id' = c('X','X','X','X','X','X','Y','Y','Y','Y','Y','Y','Z','Z'),
'user_id'=c('A','B','C','A','B','C','P','Q','R','P','Q','R',NA,NA),
'mth_id'=c('1393','1393','1393','1398','1398','1398','1393','1393','1393','1398','1398','1398','1393','1398'),
'col_val1' = c(5,4,6,3,1,10,12,15,18,13,19,1,5,2),
'col_val2'=c(8,12,9,2,12,5,7,9,11,4,0,7,10,5))
> df
controller_id user_id mth_id col_val1 col_val2
1 X A 1393 5 8
2 X B 1393 4 12
3 X C 1393 6 9
4 X A 1398 3 2
5 X B 1398 1 12
6 X C 1398 10 5
7 Y P 1393 12 7
8 Y Q 1393 15 9
9 Y R 1393 18 11
10 Y P 1398 13 4
11 Y Q 1398 19 0
12 Y R 1398 1 7
13 Z <NA> 1393 5 10
14 Z <NA> 1398 2 5
如果给定控制器id没有关联的用户id,则应计算控制器id本身之间的列值差
理想情况下,我希望将这些输出存储在列表/数据框中,以供以后使用。
此外,该代码将针对数据框中存在的约900列运行
任何帮助都将不胜感激。考虑使用运行组和的基本R解决方案。要遍历所有列,请使用
sapply()
传入列名:
rowdiff <- function(col) {
sapply(1:nrow(df),
function(i){
# CONDITIONAL TO RETURN NA FOR FIRST VAL IN EACH USER ID
ifelse(sum(df[1:i, c("user_id")] == df$user_id[i]) == 1, NA,
# DIFFERENCE OF CURRENT LOOP COL VALUE - LAST COL VALUE OF USER ID GROUP
df[[col]][i] -
sum((df[1:i-1, c("user_id")] == df$user_id[i])
* df[1:i-1,][[col]]))
})
}
finaldf <- cbind(df, data.frame(sapply(names(df[c(3:ncol(df))]), rowdiff)))
# user_id mth_id col_val1 col_val2 col_val3 col_val1 col_val2 col_val3
# 1 A 1398 4 2 12 NA NA NA
# 2 B 1398 3 3 30 NA NA NA
# 3 C 1398 1 1 14 NA NA NA
# 4 A 1393 5 7 7 1 5 -5
# 5 B 1393 2 6 18 -1 3 -12
# 6 C 1393 7 0 9 6 -1 -5
# 7 D 1398 4 5 12 NA NA NA
# 8 D 1393 0 3 24 -4 -2 12
rowdiff感谢@Parfait的回复。这很有帮助。然而,我的问题陈述得到了轻微的修改。我还有一个层次可供选择。你能建议一些可以解决这个问题的改变吗?嘿@Parfait,如果价值没有变化怎么办。因为我们的增量不是增加就是减少,如果出现一个条件,比如值没有变化,我想把它打印出来。任何关于如何做到这一点的建议。请参阅更新,处理分层控制器id
和不增加场景。请务必按mth\u id
顺序订购数据帧,以便1393先于1398。@akrun您有什么建议吗?
rowdiff <- function(col) {
sapply(1:nrow(df),
function(i){
# CONDITIONAL TO RETURN NA FOR FIRST VAL IN EACH USER ID
ifelse(sum(df[1:i, c("user_id")] == df$user_id[i]) == 1, NA,
# DIFFERENCE OF CURRENT LOOP COL VALUE - LAST COL VALUE OF USER ID GROUP
df[[col]][i] -
sum((df[1:i-1, c("user_id")] == df$user_id[i])
* df[1:i-1,][[col]]))
})
}
finaldf <- cbind(df, data.frame(sapply(names(df[c(3:ncol(df))]), rowdiff)))
# user_id mth_id col_val1 col_val2 col_val3 col_val1 col_val2 col_val3
# 1 A 1398 4 2 12 NA NA NA
# 2 B 1398 3 3 30 NA NA NA
# 3 C 1398 1 1 14 NA NA NA
# 4 A 1393 5 7 7 1 5 -5
# 5 B 1393 2 6 18 -1 3 -12
# 6 C 1393 7 0 9 6 -1 -5
# 7 D 1398 4 5 12 NA NA NA
# 8 D 1393 0 3 24 -4 -2 12
statements <- function(col) {
sapply(1:nrow(df),
function(i){
delta <- df[[col]][i]-
sum((df[1:i-1, c("controller_id")] == df$controller_id[i])
*(df[1:i-1, c("user_id")] == df$user_id[i])
* df[1:i-1,][[col]])
changeword <- ifelse(delta < 0, "decreased",
ifelse(delta > 0, "increased", "not changed"))
ifelse(sum(df[1:i, c("user_id")] == df$user_id[i]) == 1, NA,
paste0(col, " for controller_id '", df$controller_id[i], "', user_id '",
df$user_id[i], "' has ", changeword, " from ",
sum((df[1:i-1, c("controller_id")] == df$controller_id[i])
* (df[1:i-1, c("user_id")] == df$user_id[i])
* df[1:i-1,][[col]]), " to ",
df[[col]][i])
)
})
}
finaldf <- cbind(df, data.frame(sapply(names(df[c(4:ncol(df))]), statements)))
col_val1
1 <NA>
2 <NA>
3 <NA>
4 col_val1 for controller_id 'X', user_id 'A' has decreased from 5 to 3
5 col_val1 for controller_id 'X', user_id 'B' has decreased from 4 to 1
6 col_val1 for controller_id 'X', user_id 'C' has increased from 6 to 10
7 <NA>
8 <NA>
9 <NA>
10 col_val1 for controller_id 'Y', user_id 'P' has increased from 12 to 13
11 col_val1 for controller_id 'Y', user_id 'Q' has increased from 15 to 19
12 col_val1 for controller_id 'Y', user_id 'R' has decreased from 18 to 1
13 <NA>
14 <NA>
col_val2
1 <NA>
2 <NA>
3 <NA>
4 col_val2 for controller_id 'X', user_id 'A' has decreased from 8 to 2
5 col_val2 for controller_id 'X', user_id 'B' has not changed from 12 to 12
6 col_val2 for controller_id 'X', user_id 'C' has decreased from 9 to 5
7 <NA>
8 <NA>
9 <NA>
10 col_val2 for controller_id 'Y', user_id 'P' has decreased from 7 to 4
11 col_val2 for controller_id 'Y', user_id 'Q' has decreased from 9 to 0
12 col_val2 for controller_id 'Y', user_id 'R' has decreased from 11 to 7
13 <NA>
14 <NA>