计算R中的自引用变量_R_Statistics

计算R中的自引用变量

r statistics

计算R中的自引用变量,r,statistics,R,Statistics,我试图在一个数据框中创建一个变量，该变量将引用上一行（在所创建的变量中）来派生一个值。我对R比较陌生，我来自excel，这种类型的自我引用和迭代更新功能非常简单 mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2), fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","",""), trial.marker=c("","","","","","","",1,"","")) mydata

我试图在一个数据框中创建一个变量，该变量将引用上一行（在所创建的变量中）来派生一个值。我对R比较陌生，我来自excel，这种类型的自我引用和迭代更新功能非常简单

mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2),
fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","",""),
trial.marker=c("","","","","","","",1,"",""))
mydata

trial fixation trial.marker
1                      
1                      
1     aoi1             
1     aoi1             
1                      
1     aoi3             
1     aoi3             
1                     1
2                      
2

mydata这是我的尝试。请注意，我不是R方面的专家（更多的是将此作为一个学习练习），所以我希望其他人参与进来，或者至少批评我的代码
我在你的数据中添加了几行，以进行检查。它仍然循环，但这一次只是在试验数量上，应该更快
mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2),
                     fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","aoi3",""),
                     trial.marker=c("","","","","","","",1,"",""))
mydata
#structure shows it produces factored data (which I don't know enough about to like)
str(mydata)

#To avoid factors use stringsAsFactors = FALSE, also added blank column for first.fixation
mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2,3,3),
                     fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","","","aoi3",""),
                     trial.marker=c("","","","","","","",1,"",2,"",""),
                     first.fixation="",
                     stringsAsFactors = FALSE)
mydata
str(mydata)


trials<-unique(mydata$trial)

#which returns the indices that match the criteria, function not used for anything just for demonstration
which(mydata$fixation!="" & mydata$trial==1)

#loop through trials
for (i in 1:length(trials)){
  trial<-trials[i]
  #If there are no fixation it would error out so if statement
  if(length(which(mydata$fixation!="" & mydata$trial==trial))>0){
    #Find the last row with the given trial number
    rowmax <- max(which(mydata$trial==trial))
    #Find the first row with given trial number and fixation
    rowmin <- min(which(mydata$fixation!="" & mydata$trial==trial))
    #fill the data in
    mydata$first.fixation[rowmin:rowmax] = mydata$fixation[rowmin]
  }
}
mydata

理想情况下，避免R中的循环，因为矢量化操作几乎总是更快
mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2),
                     fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","aoi3",""),
                     trial.marker=c("","","","","","","",1,"",""))
mydata
#structure shows it produces factored data (which I don't know enough about to like)
str(mydata)

#To avoid factors use stringsAsFactors = FALSE, also added blank column for first.fixation
mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2,3,3),
                     fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","","","aoi3",""),
                     trial.marker=c("","","","","","","",1,"",2,"",""),
                     first.fixation="",
                     stringsAsFactors = FALSE)
mydata
str(mydata)


trials<-unique(mydata$trial)

#which returns the indices that match the criteria, function not used for anything just for demonstration
which(mydata$fixation!="" & mydata$trial==1)

#loop through trials
for (i in 1:length(trials)){
  trial<-trials[i]
  #If there are no fixation it would error out so if statement
  if(length(which(mydata$fixation!="" & mydata$trial==trial))>0){
    #Find the last row with the given trial number
    rowmax <- max(which(mydata$trial==trial))
    #Find the first row with given trial number and fixation
    rowmin <- min(which(mydata$fixation!="" & mydata$trial==trial))
    #fill the data in
    mydata$first.fixation[rowmin:rowmax] = mydata$fixation[rowmin]
  }
}
mydata

mydata这是我的尝试。请注意，我不是R方面的专家（更多的是将此作为一个学习练习），所以我希望其他人参与进来，或者至少批评我的代码
我在你的数据中添加了几行，以进行检查。它仍然循环，但这一次只是在试验数量上，应该更快
mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2),
                     fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","aoi3",""),
                     trial.marker=c("","","","","","","",1,"",""))
mydata
#structure shows it produces factored data (which I don't know enough about to like)
str(mydata)

#To avoid factors use stringsAsFactors = FALSE, also added blank column for first.fixation
mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2,3,3),
                     fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","","","aoi3",""),
                     trial.marker=c("","","","","","","",1,"",2,"",""),
                     first.fixation="",
                     stringsAsFactors = FALSE)
mydata
str(mydata)


trials<-unique(mydata$trial)

#which returns the indices that match the criteria, function not used for anything just for demonstration
which(mydata$fixation!="" & mydata$trial==1)

#loop through trials
for (i in 1:length(trials)){
  trial<-trials[i]
  #If there are no fixation it would error out so if statement
  if(length(which(mydata$fixation!="" & mydata$trial==trial))>0){
    #Find the last row with the given trial number
    rowmax <- max(which(mydata$trial==trial))
    #Find the first row with given trial number and fixation
    rowmin <- min(which(mydata$fixation!="" & mydata$trial==trial))
    #fill the data in
    mydata$first.fixation[rowmin:rowmax] = mydata$fixation[rowmin]
  }
}
mydata

理想情况下，避免R中的循环，因为矢量化操作几乎总是更快
mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2),
                     fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","aoi3",""),
                     trial.marker=c("","","","","","","",1,"",""))
mydata
#structure shows it produces factored data (which I don't know enough about to like)
str(mydata)

#To avoid factors use stringsAsFactors = FALSE, also added blank column for first.fixation
mydata <- data.frame(trial = c(1,1,1,1,1,1,1,1,2,2,3,3),
                     fixation=c("","","aoi1","aoi1","","aoi3","aoi3","","","","aoi3",""),
                     trial.marker=c("","","","","","","",1,"",2,"",""),
                     first.fixation="",
                     stringsAsFactors = FALSE)
mydata
str(mydata)


trials<-unique(mydata$trial)

#which returns the indices that match the criteria, function not used for anything just for demonstration
which(mydata$fixation!="" & mydata$trial==1)

#loop through trials
for (i in 1:length(trials)){
  trial<-trials[i]
  #If there are no fixation it would error out so if statement
  if(length(which(mydata$fixation!="" & mydata$trial==trial))>0){
    #Find the last row with the given trial number
    rowmax <- max(which(mydata$trial==trial))
    #Find the first row with given trial number and fixation
    rowmin <- min(which(mydata$fixation!="" & mydata$trial==trial))
    #fill the data in
    mydata$first.fixation[rowmin:rowmax] = mydata$fixation[rowmin]
  }
}
mydata

mydata我会用data.table
解决它，它通常会提供非常好的性能。虽然我还没有运行一个容量基准测试。这就是解决办法
library(data.table)
dt <- data.table(mydata)
f <- function(fixation) {
  if (length(which(fixation != "")) == 0) {
    return(rep("", length(fixation)))
  }
  min_informed <- min(which(fixation != ""))
  return(c(rep("", min_informed-1), rep(fixation[min_informed], length(fixation)-min_informed+1)))
}
dt[, fist.fixation:=f(fixation), by=list(trial)]

猜测您不熟悉数据表
，对代码的一些解释：在dt[，fist.fixture:=f（fixture），by=list（trial）]
中，第一个参数是查询，在本例中，所有元素，第二个参数是创建新列first.fixture
，它来自函数f
的结果，第三个参数是按试验进行分组=>因此函数f
接收一个向量，其中包含每个试验的所有固定值。一旦有了向量，在函数f
中，就很容易知道哪个是第一个通知的向量，依此类推
如果你决定检查一下你的bigdata.frame，如果你把你得到的时间贴出来就好了。我想大概需要几分钟的时间（不过可能需要几分钟）
希望它能有所帮助。
我会用data.table解决它，它通常会提供非常好的性能。虽然我还没有运行一个容量基准测试。这就是解决办法
library(data.table)
dt <- data.table(mydata)
f <- function(fixation) {
  if (length(which(fixation != "")) == 0) {
    return(rep("", length(fixation)))
  }
  min_informed <- min(which(fixation != ""))
  return(c(rep("", min_informed-1), rep(fixation[min_informed], length(fixation)-min_informed+1)))
}
dt[, fist.fixation:=f(fixation), by=list(trial)]

猜测您不熟悉数据表
，对代码的一些解释：在dt[，fist.fixture:=f（fixture），by=list（trial）]
中，第一个参数是查询，在本例中，所有元素，第二个参数是创建新列first.fixture
，它来自函数f
的结果，第三个参数是按试验进行分组=>因此函数f
接收一个向量，其中包含每个试验的所有固定值。一旦有了向量，在函数f
中，就很容易知道哪个是第一个通知的向量，依此类推
如果你决定检查一下你的bigdata.frame，如果你把你得到的时间贴出来就好了。我想大概需要几分钟的时间（不过可能需要几分钟）
希望它能有所帮助。
所以我确信我用了另一种方法解决了这个问题。键入我的问题向我表明，我正在寻找一份试验总结，因此我做了以下工作：
first.match <- function(x,y){
 match.list <- sort(match(x,y),decreasing=FALSE)
 y[match.list[1]]  
}

ff.data <-aggregate(x=exp2data$aoifixation,
by=list(exp2data$subject,exp2data$trial),
FUN=function(x) first.match(c("AOI1","AOI3"),x))

first.match所以我很确定我用了另一种方法解决了这个问题。键入我的问题向我表明，我正在寻找一份试验总结，因此我做了以下工作：
first.match <- function(x,y){
 match.list <- sort(match(x,y),decreasing=FALSE)
 y[match.list[1]]  
}

ff.data <-aggregate(x=exp2data$aoifixation,
by=list(exp2data$subject,exp2data$trial),
FUN=function(x) first.match(c("AOI1","AOI3"),x))

first.match Dave，for循环中的“恐惧”已经成为过去。然而，如果您的代码在for循环中“增长”了一个向量或其他对象，那么预期效率会显著降低。在所有编程语言中，尤其是R语言中，在循环之前创建存储对象。（例如，数值向量：numeric（length=MyLength）
或数值矩阵matrix（0，myRows，myCols）
）。如果您只是修改一个对象，这就不太重要了。也就是说，你对for循环的使用可以通过其他途径大大改进。Dave，for循环中的“恐惧”已经成为过去。然而，如果您的代码在for循环中“增长”了一个向量或其他对象，那么预期效率会显著降低。在所有编程语言中，尤其是R语言中，在循环之前创建存储对象。（例如，数值向量：numeric（length=MyLength）
或数值矩阵matrix（0，myRows，myCols）
）。如果您只是修改一个对象，这就不太重要了。也就是说，通过其他路由可以大大改进for循环的使用。看起来在data.frame之外还有很多潜力，所以我一定会在有机会的时候学会如何使用它。嗨。根据我的经验（老实说，这并不是很大），大多数时候，当我面临像你这样的复杂转换时，我会尝试使用aggregate
和plyr
等函数进行转换，我最终使用了data.table
，我得到的最终代码更清晰、更快，尽管库确实更难学习。在您的情况下，如果您发布的解决方案需要几秒钟的时间，我认为data.table
不会超过这一点，因此没有必要进一步调查。干杯最近我发现shift（）时，刚遇到data.table。看起来在data.frame之外还有很多潜力，所以我一定会在有机会的时候学会如何使用它。嗨。根据我的经验（老实说，这并不是很大），大多数时候，当我面临像你这样的复杂转换时，我会尝试使用aggregate
和plyr
等函数进行转换，我最终使用了data.table
，我得到的最终代码更清晰、更快，尽管库确实更难学习。在你的情况下，如果解决方案