R 按组减去每列的最小值-将减去的值添加到df中的另一列
我在下面有一个数据框:R 按组减去每列的最小值-将减去的值添加到df中的另一列,r,for-loop,dplyr,apply,mutate,R,For Loop,Dplyr,Apply,Mutate,我在下面有一个数据框: date group col1 col2 col3 col4 col5 1234 1 -2 3 4 -5 100 1235 1 4 5 -2 -7 200 1234 1 -5 2 9 1
date group col1 col2 col3 col4 col5
1234 1 -2 3 4 -5 100
1235 1 4 5 -2 -7 200
1234 1 -5 2 9 1 400
1235 1 8 2 -4 7 900
1235 2 -72 83 -54 98 800
1233 2 32 -21 -1 4 900
1342 2 -54 0 -10 -11 100
1234 2 98 -8 -9 -10 100
以下是我想做的:
对于从df[,3]到倒数第二列的列,我要执行以下操作:
1) 对于每列,按组取正数的最小值和负数的最小值
2) 然后使用以下逻辑替换当前值:
a) 如果该值为正数,则按组减去为正数找到的最小值
b) 如果该值为负值,则按组减去为负数找到的最小值
c) 如果该值为0,则不进行更改
3) 然后获取该行中每个值减去的总值,并将其添加到最后一列值
Minimum for col1 neg, group 1 = -5
Minimum for col1 pos, group 1 = 4
Minimum for col1 neg, group 2 = -72
Minimum for col1 pos, group 2 = 32
Minimum for col2 neg, group 1 = NA
Minimum for col2 pos, group 1 = 2
etc.
我希望我的最终输出如下所示:
date group col1 col2 col3 col4 col5
1234 1 -2-(-5) 3-2 4-4 -5-(-7) 100+(-5)+2+4+(-7)
1235 1 4-4 5-2 -2-(-4) -7-(-7) 200+4+2+(-4)+(-7)
1234 1 -5-(-5) 2-2 9-4 1-1 400+(-5)+2+4+1
1235 1 8-4 2-2 -4-(-4) 7-1 900+4+2+(-4)+1
1235 2 -72-(-72) 83-83 -54-(-54) 98-4 800+(-72)+83+(-54)+4
1233 2 32-32 -21-(-21) -1-(-54) 4-4 900+32+(-21)+(-54)+4
1342 2 -54-(-72) 0-0 -10-(-54) -11-(-11) 100+(-72)+0+(-54)+(-11)
1234 2 98-32 -8-(-21) -9-(-54) -10-(-11) 100+32+(-21)+(-54)+(-11)
预期产出:
date group col1 col2 col3 col4 col5
1234 1 3 1 0 2 94
1235 1 0 3 2 0 195
1234 1 0 0 5 0 402
1235 1 4 0 0 6 903
1235 2 0 0 0 94 761
1233 2 0 0 53 0 861
1342 2 18 0 44 0 -37
1234 2 66 13 45 1 46
按“组”分组后,
使用正负数的min
值将列“col1”变为“col4”,然后将数字的行和与“col5”相加并更新“col5”。稍后,通过从初始数据集(“df1”)的相应列中减去,将“col1”更新为“col4”
或者转换为“长”格式进行计算,然后将其更改为“宽”
library(tidyverse)
df1 %>%
rownames_to_column('rn') %>%
gather(key, val, col1:col4) %>%
group_by(group, key, sn= sign(val)) %>%
mutate(mnVal = min(val)) %>%
group_by(rn) %>%
mutate(col5 = col5 + sum(mnVal), val = val - mnVal) %>%
select(-sn, -mnVal) %>%
spread(key, val) %>%
ungroup %>%
select(names(df1))
数据
df1道歉是的意思是说min only你的尝试是什么?问题已经回答了。在我的实际数据集中,存在可能导致此警告的NA值-In min(col4[col4<0]):min没有未丢失的参数;返回-Inf
。根据@akrun的回答,不确定这是否是原因
library(rlang)
expr <- paste(glue::glue('{nm1} - {nm1}_new'), collapse=";")
df1 %>%
group_by(group) %>%
mutate_at(3:6, funs(new = ave(., sign(.), FUN = min))) %>%
ungroup %>%
mutate(col5 = col5 + select(., col1_new:col4_new) %>%
reduce(`+`)) %>%
transmute(date, group, !!! parse_exprs(expr), col5) %>%
rename_at(3:6, ~ nm1)
# A tibble: 8 x 7
# date group col1 col2 col3 col4 col5
# <int> <int> <int> <int> <int> <int> <int>
#1 1234 1 3 1 0 2 94
#2 1235 1 0 3 2 0 195
#3 1234 1 0 0 5 0 402
#4 1235 1 4 0 0 6 903
#5 1235 2 0 0 0 94 761
#6 1233 2 0 0 53 0 861
#7 1342 2 18 0 44 0 -37
#8 1234 2 66 13 45 1 46
library(tidyverse)
df1 %>%
rownames_to_column('rn') %>%
gather(key, val, col1:col4) %>%
group_by(group, key, sn= sign(val)) %>%
mutate(mnVal = min(val)) %>%
group_by(rn) %>%
mutate(col5 = col5 + sum(mnVal), val = val - mnVal) %>%
select(-sn, -mnVal) %>%
spread(key, val) %>%
ungroup %>%
select(names(df1))
df1 <- structure(list(date = c(1234L, 1235L, 1234L, 1235L, 1235L, 1233L,
1342L, 1234L), group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), col1 = c(-2L,
4L, -5L, 8L, -72L, 32L, -54L, 98L), col2 = c(3L, 5L, 2L, 2L,
83L, -21L, 0L, -8L), col3 = c(4L, -2L, 9L, -4L, -54L, -1L, -10L,
-9L), col4 = c(-5L, -7L, 1L, 7L, 98L, 4L, -11L, -10L), col5 = c(100L,
200L, 400L, 900L, 800L, 900L, 100L, 100L)), .Names = c("date",
"group", "col1", "col2", "col3", "col4", "col5"),
class = "data.frame", row.names = c(NA,
-8L))