Performance 使循环更快_Performance_R

Performance 使循环更快

performance r

Performance 使循环更快,performance,r,Performance,R,这个小代码片段应该在排序后的数据帧中循环。它记录在列aIndex和cIndex以及bIndex和dIndex中有多少连续行具有相同的信息。如果这些值相同，则会在下一次存储计数并递增，如果它们不同，则会在下一次存储计数并重置为1 for (i in 1:nrow(myFrame)) { if (myFrame[i, aIndex] == myFrame[i, cIndex] & myFrame[i, bIndex] == myFrame[i, dIndex]) {

这个小代码片段应该在排序后的数据帧中循环。它记录在列aIndex和cIndex以及bIndex和dIndex中有多少连续行具有相同的信息。如果这些值相同，则会在下一次存储计数并递增，如果它们不同，则会在下一次存储计数并重置为1

for (i in 1:nrow(myFrame)) {
  if (myFrame[i, aIndex] == myFrame[i, cIndex] &
    myFrame[i, bIndex] == myFrame[i, dIndex]) {
      myFrame[i, eIndex] <- count
      count <- (count + 1)
  } else {
      myFrame[i, eIndex] <- count
      count <- 1
  }
}

我想这会满足你的要求；棘手的部分是，计数在差值之后重置，这有效地使

eIndex

发生了移位

有（希望）一个更简单的方法来做到这一点，但这就是我想到的

tmprle <- rle(((myFrame$aIndex == myFrame$cIndex) & 
               (myFrame$bIndex == myFrame$dIndex)))
myFrame$eIndex <- c(1,
                    unlist(ifelse(tmprle$values, 
                                  Vectorize(seq.default)(from = 2,
                                                         length = tmprle$lengths), 
                                  lapply(tmprle$lengths, 
                                         function(x) {rep(1, each = x)})))
                    )[-(nrow(myFrame)+1)]

我想这会满足你的要求；棘手的部分是，计数在差值之后重置，这有效地使

eIndex

发生了移位

有（希望）一个更简单的方法来做到这一点，但这就是我想到的

tmprle <- rle(((myFrame$aIndex == myFrame$cIndex) & 
               (myFrame$bIndex == myFrame$dIndex)))
myFrame$eIndex <- c(1,
                    unlist(ifelse(tmprle$values, 
                                  Vectorize(seq.default)(from = 2,
                                                         length = tmprle$lengths), 
                                  lapply(tmprle$lengths, 
                                         function(x) {rep(1, each = x)})))
                    )[-(nrow(myFrame)+1)]

也许这会奏效。我已经修改了

rle

和

序列

位

dat <- read.table(text="aIndex bIndex cIndex dIndex
1 2 1 2
1 2 1 2
1 2 4 8
4 8 1 4
1 4 1 4", header=TRUE, as.is=TRUE,sep = " ")
dat$eIndex <-NA
#identify rows where a=c and b=d, multiply by 1 to get a numeric vector
dat$id<-(dat$aIndex==dat$cIndex & dat$bIndex==dat$dIndex)*1
#identify sequence
runs <- rle(dat$id)
#create sequence, multiply by id to keep only identicals, +1 at the end
count <-sequence(runs$lengths)*dat$id+1
#shift sequence down one notch, start with 1
dat$eIndex <-c(1,count[-length(count)])
dat

  aIndex bIndex cIndex dIndex eIndex id
1      1      2      1      2      1  1
2      1      2      1      2      2  1
3      1      2      4      8      3  0
4      4      8      1      4      1  0
5      1      4      1      4      1  1

dat也许这会奏效。我已经修改了rle
和序列
位
dat <- read.table(text="aIndex bIndex cIndex dIndex
1 2 1 2
1 2 1 2
1 2 4 8
4 8 1 4
1 4 1 4", header=TRUE, as.is=TRUE,sep = " ")
dat$eIndex <-NA
#identify rows where a=c and b=d, multiply by 1 to get a numeric vector
dat$id<-(dat$aIndex==dat$cIndex & dat$bIndex==dat$dIndex)*1
#identify sequence
runs <- rle(dat$id)
#create sequence, multiply by id to keep only identicals, +1 at the end
count <-sequence(runs$lengths)*dat$id+1
#shift sequence down one notch, start with 1
dat$eIndex <-c(1,count[-length(count)])
dat

  aIndex bIndex cIndex dIndex eIndex id
1      1      2      1      2      1  1
2      1      2      1      2      2  1
3      1      2      4      8      3  0
4      4      8      1      4      1  0
5      1      4      1      4      1  1

dat应该说：数据框大约有30万行。Chase要求您提供一段数据myFrame
，以便我们查看其结构并运行您提供的代码。DanielCates:我相信Chase的意思是myFrame
对象的一个样本。eIndex
对吗？第三行似乎不正确，因为两个条件都不正确（aIndex！=cIndex
和bIndex！=
dIndex`）。第三行应该正确，因为它应该在将其设置回1之前将count
放在那里。另外，我应该如何给出对象的示例？应该说：dataframe大约有30万行。Chase要求提供您的数据片段myFrame
，以便我们可以查看其结构并运行您提供的代码。DanielCates:我相信Chase的意思是myFrame
对象的一个样本。eIndex
对吗？第三行似乎不正确，因为两个条件都不正确（aIndex！=cIndex
和bIndex！=
dIndex`）。第三行应该正确，因为它应该在将其设置回1之前将count
放在那里。还有，我应该如何给出对象的样本呢？这对我的扩展示例也适用（dat），对我的扩展示例也适用（dat