R 根据数据帧中的某些条件更改一个值
我有一个类似于这个的数据帧R 根据数据帧中的某些条件更改一个值,r,dataframe,R,Dataframe,我有一个类似于这个的数据帧 session <- c(rep(34,8), rep(28,8)) trial_index <- c(rep(2,4),rep(5,4),rep(6,4),rep(8,4)) label <- c(rep(c("a","b","c","d"),4)) time <- c(10,2,7,40,4,3,6,20,5,3,5,15,4,2,3,17) data <-data.frame(session, trial_index,label,t
session <- c(rep(34,8), rep(28,8))
trial_index <- c(rep(2,4),rep(5,4),rep(6,4),rep(8,4))
label <- c(rep(c("a","b","c","d"),4))
time <- c(10,2,7,40,4,3,6,20,5,3,5,15,4,2,3,17)
data <-data.frame(session, trial_index,label,time)
session一种方法是重新排列数据,使标签成为每个session-trial\u索引组合的单独列。然后,d的计算是一个简单的基于列的减法。发布此消息后,您可以将数据转换回此原始表单
下面是相同的示例实现
library(tidyr) # To rearrange the data
library(dplyr) # To do the subtraction
data <- tidyr::spread(data, key = label, value = time) %>% # Makes labels as columns
dplyr::mutate(d = d - c - b - a) %>%
tidyr::gather(key = label, value = time,-session,-trial_index) # Convert back
可能是这样的:
newdf <- data[, list(new=time[label=='d'] - time[label=='c'] - time[label=='b'] - time[label=='a']) ,list(session, trial_index)]
data <- merge(data,newdf)
data[label=='d',time := new]
data[,new := NULL]
newdf可能是一种稍微复杂的方法,但现在就开始吧
1) 将列向下移动,以便获得d旁边的a、b、c值
data <- data %>% mutate(time2 = lag(time), time3 = lag(time2), time4 = lag(time3))
使用data.table的解决方案
library(data.table)
## Just subset everything from "d" (as the order doesn't really matter) by group
d <- setDT(data)[, Reduce(`-`, rev(time)), by = .(session, trial_index)]$V1
## Insert the results only for "d"
data[label == "d", time := d]
data
# session trial_index label time
# 1: 34 2 a 10
# 2: 34 2 b 2
# 3: 34 2 c 7
# 4: 34 2 d 21
# 5: 34 5 a 4
# 6: 34 5 b 3
# 7: 34 5 c 6
# 8: 34 5 d 7
# 9: 28 6 a 5
# 10: 28 6 b 3
# 11: 28 6 c 5
# 12: 28 6 d 2
# 13: 28 8 a 4
# 14: 28 8 b 2
# 15: 28 8 c 3
# 16: 28 8 d 8
库(data.table)
##只需将“d”(顺序并不重要)中的所有内容按组进行子集
d@DavidArenburg,非常感谢。我不知道。更新:)这是否假定时间是按标签a-
,rev(time))的这一部分时遇到了问题。如果你能给我解释一下,我将不胜感激。另外,如果我有一个额外的变量(time2)并且我想做同样的事情呢?我需要为每个人单独做吗?Thanks@rookie它只假设“d”是最后一个,根据数学规则,顺序的其余部分都无关紧要。@unomas83。Reduce
只是对宝贵子集中的每个值进行子集设置,并最终返回单个值。如果要对多个值执行此操作,只需将其放入lappy(.SD,…)
并指定.SDcols
。您应该阅读一些data.table教程,因为我不在电脑前面,现在无法编写,抱歉。非常感谢您的回答。它工作得很好!我在想。如果我有一个额外的变量(time2)并且我想做同样的事情呢?我需要为每个人分别做吗?是的,我想这是一个限制。也许函数会有帮助,但它不会很优雅。我认为其他一些解决方案在这种情况下可能会更好。
data <- data %>% mutate(time2 = lag(time), time3 = lag(time2), time4 = lag(time3))
data <- transform(data, time = ifelse(label == 'd', time-time2-time3-time4, time))
data <- data[-c(5, 6, 7)]
session trial_index label time
1 34 2 a 10
2 34 2 b 2
3 34 2 c 7
4 34 2 d 21
5 34 5 a 4
6 34 5 b 3
7 34 5 c 6
8 34 5 d 7
9 28 6 a 5
10 28 6 b 3
11 28 6 c 5
12 28 6 d 2
13 28 8 a 4
14 28 8 b 2
15 28 8 c 3
16 28 8 d 8
library(data.table)
## Just subset everything from "d" (as the order doesn't really matter) by group
d <- setDT(data)[, Reduce(`-`, rev(time)), by = .(session, trial_index)]$V1
## Insert the results only for "d"
data[label == "d", time := d]
data
# session trial_index label time
# 1: 34 2 a 10
# 2: 34 2 b 2
# 3: 34 2 c 7
# 4: 34 2 d 21
# 5: 34 5 a 4
# 6: 34 5 b 3
# 7: 34 5 c 6
# 8: 34 5 d 7
# 9: 28 6 a 5
# 10: 28 6 b 3
# 11: 28 6 c 5
# 12: 28 6 d 2
# 13: 28 8 a 4
# 14: 28 8 b 2
# 15: 28 8 c 3
# 16: 28 8 d 8