Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 在ddply内操作自定义循环_R_For Loop_Plyr_Conditional Statements - Fatal编程技术网

R 在ddply内操作自定义循环

R 在ddply内操作自定义循环,r,for-loop,plyr,conditional-statements,R,For Loop,Plyr,Conditional Statements,我的数据集大约有54000行。我想根据另一列中的一个值以及之前是否看到过另一列的值,将一个值(First_Pass)设置为T或F。我有一个for循环,它正是我所需要的。但是,该循环仅适用于数据的一个子集。我需要同样的循环,以便根据因子级别为不同的子集单独运行 这似乎是plyr函数的完美案例,因为我想将数据分割成子集,应用一个函数(我的for循环),然后重新加入数据。然而,我无法让它工作。首先,我给出了df的一个示例,名为char.data session_id list Sent_Or

我的数据集大约有54000行。我想根据另一列中的一个值以及之前是否看到过另一列的值,将一个值(First_Pass)设置为T或F。我有一个for循环,它正是我所需要的。但是,该循环仅适用于数据的一个子集。我需要同样的循环,以便根据因子级别为不同的子集单独运行

这似乎是plyr函数的完美案例,因为我想将数据分割成子集,应用一个函数(我的for循环),然后重新加入数据。然而,我无法让它工作。首先,我给出了df的一个示例,名为char.data

     session_id list Sent_Order Sentence_ID Cond1 Cond2 Q_ID   Was_y CI CI_Delta character tsle tsoc Direct
5139          2    b          9          25    rc    su   25 correct  1        0         T  995   56      R
5140          2    b          9          25    rc    su   25 correct  2        1         h   56   56      R
5141          2    b          9          25    rc    su   25 correct  3        1         e   56   56      R
5142          2    b          9          25    rc    su   25 correct  4        1             56   37      R
那里有些杂乱。关键列是会话id、句子id、CI和CI增量

然后我初始化一个名为First_Pass to“F”的列


char.data$First_Pass仅使用您提供的四行测试有点困难。我创建了随机数据,看看它是否有效,而且似乎对我有效。在你的数据上也尝试一下

这将使用
data.table
库,并且不会尝试在
ddply
中运行
循环。我想手段并不重要

library(data.table)
dt <- data.table(df)  
l <- c(200)

# subsetting to keep only the important fields
dt <- dt[,list(session_id, Sentence_ID, CI, CI_Delta)]

# Initialising First_Pass    
dt[,First_Pass := 'F']

# The next two lines are basically rewording your logic -

# Within each group of session_id, Sentence_ID, identify the duplicate CI entries. These would have been inserted in l. The first time occurence of these CI entries is marked false as they wouldn't have been in l when that row was being checked 
dt[CI_Delta >= 0,duplicatedCI := duplicated(CI), by = c("session_id", "Sentence_ID")]

# So if the CI value hasn't occurred before within the session_id,Sentence_ID group, and it doesn't appear in l, then mark it as "T"
dt[!(CI %in% l) & !(duplicatedCI), First_Pass := "T"]

# Just for curiosity's sake, calculating l too
l <- c(l,dt[duplicatedCI == FALSE,CI])
库(data.table)

我可能已经解决了这个问题。我将返回移到for循环之外,现在它返回的答案看起来更合理。我将检查所有数据,然后记录并关闭它是否确实正确。(我保证在发布之前我看了几个小时。)
#define function
set_fp <- function (df){

  l <- 200
  for (i in 1:nrow(df)) {
    if(df[i,]$CI_Delta >= 0 & df[i,]$CI %nin% l){
      df[i,]$First_Pass <- "T"
      l <- c(l,df[i,]$CI)}
    else df[i,]$First_Pass <- "F"
    return(df)
  }

}

char.data.fp <- ddply(char.data,c("session_id","Sentence_ID"),function(df)set_fp(df))
library(data.table)
dt <- data.table(df)  
l <- c(200)

# subsetting to keep only the important fields
dt <- dt[,list(session_id, Sentence_ID, CI, CI_Delta)]

# Initialising First_Pass    
dt[,First_Pass := 'F']

# The next two lines are basically rewording your logic -

# Within each group of session_id, Sentence_ID, identify the duplicate CI entries. These would have been inserted in l. The first time occurence of these CI entries is marked false as they wouldn't have been in l when that row was being checked 
dt[CI_Delta >= 0,duplicatedCI := duplicated(CI), by = c("session_id", "Sentence_ID")]

# So if the CI value hasn't occurred before within the session_id,Sentence_ID group, and it doesn't appear in l, then mark it as "T"
dt[!(CI %in% l) & !(duplicatedCI), First_Pass := "T"]

# Just for curiosity's sake, calculating l too
l <- c(l,dt[duplicatedCI == FALSE,CI])