R使用矢量化替换数据帧记录

R使用矢量化替换数据帧记录,r,for-loop,replace,dataframe,vectorization,R,For Loop,Replace,Dataframe,Vectorization,我想知道是否有任何方法可以有效地解决以下问题。我有一个X-Y点的集合。对于每个点,我需要生成一定数量的记录,最后,我需要将生成的所有记录堆叠在一起。最初,我使用FOR循环,并使用cbind在每个循环中堆叠data.frame。现在通过定义最终记录堆栈的维度稍微改变了它,我试图用生成的值替换那些0。我的代码贴在下面(有一个**我指出我被卡住的地方)…如果你能给我一个提示,或者甚至有一个更好的解决方案,那就太完美了 colonies <- read.table(text =

我想知道是否有任何方法可以有效地解决以下问题。我有一个X-Y点的集合。对于每个点,我需要生成一定数量的记录,最后,我需要将生成的所有记录堆叠在一起。最初,我使用FOR循环,并使用cbind在每个循环中堆叠data.frame。现在通过定义最终记录堆栈的维度稍微改变了它,我试图用生成的值替换那些0。我的代码贴在下面(有一个**我指出我被卡住的地方)…如果你能给我一个提示,或者甚至有一个更好的解决方案,那就太完美了

colonies <- read.table(text =             
'  X        Y      Timecount ID_col Age
582906.4 2883317      2004      1  15
583345.9 2883102      2004      2   4
583119.5 2883621      2004      3  13
583385.0 2882933      2004      4   5
583374.0 2882936      2004      5   2
583271.0 2883076      2004      7   5
582898.9 2883229      2004      8   1
582927.9 2883234      2004      9  20
582956.7 2883272      2004     10  13
582958.8 2883249      2004     11   3', header = TRUE)

year = 2004
survival_prob = 0.01
male_prob = 0.5

Present <- colonies$Timecount == year

app <- sum(colonies$Age[Present] >= 4 & colonies$Age[Present] < 10) * 1000 * survival_prob
app2 <- sum(colonies$Age[Present] >= 10 & colonies$Age[Present] < 15) * 10000 * survival_prob
app3 <- sum(colonies$Age[Present] >= 15 & colonies$Age[Present] <= 20) * 100000 * survival_prob

size <- app + app2 + app3

pop <- data.frame(matrix(0,nrow=size,ncol=2))
colnames(pop) <- c("X","Y")

if (dim(pop)[1] > 0){

 #FOR cycle going through each existing point
 for (i in 1:sum(Present)){     

   if (colonies[Present,]$Age[i] < 4) { next
   } else if (colonies[Present,]$Age[i] >= 4 & colonies[Present,]$Age[i] < 10) { alates <- 1000 
   } else if (colonies[Present,]$Age[i] >= 10 & colonies[Present,]$Age[i] < 15) { alates <- 10000 
    } else if (colonies[Present,]$Age[i] >= 15 & colonies[Present,]$Age[i] <= 20) { alates <- 100000 
    }

    indiv <- alates * survival_prob
    #Initialize two coordinate variables based on the established (or existing) colonies
    X_temp <- round(colonies[Present,]$X[i],2)
    Y_temp <- round(colonies[Present,]$Y[i],2)
    distance <- rexp(indiv,rate=1/200)
    theta <- runif(indiv, 0, 2*pi)
    C <- cos(theta)
    S <- sin(theta)
    #XY coords (meters) using polar coordinate transformations
    X <- X_temp + round(S * distance,2)
    Y <- Y_temp + round(C * distance,2)
    pop[,] <- c(X,Y) #******HERE I GOT STUCK...it should be pop[1:indiv,] 
                     #but then it does not work for the next i since it would over write...

    }
    pop$Sex <- rbinom(size,1,male_prob)
    pop$ID <- 1:dim(pop)[1]
}

colons我相信这就是您所寻找的,具有良好表现力的矢量化R代码。没有循环,甚至没有*apply family或plyr命令。你可以做很多事情使它更灵活,但是使用
rep
的核心矢量化和对随机距离的单个调用是非常关键的。我不知道为什么pop的维度会有一个
if
子句。你需要以不同的方式来处理,因为这不是最终的结果

year = 2004
survival_prob = 0.01
male_prob = 0.5

# you don't do anything in your for loop or save any of the results if the age is 
# less than 4. I'm going to just remove that from colonies on the assumption that it's 
# larger than posted and comes from a file that you won't change.  Where I edit 
# colonies you might want to work with a copy.
colonies <- colonies[colonies$Age >= 4,]

# only Present selection of colonies is ever used in this code so you could also stop 
# repeatedly selecting... this one I'm imagining you might make a copy of, something 
# like coloniesP in your real code.  In general, you want as little going on in a 
# loop and as little repeating yourself as possible.  Note, this might be memory 
# intensive if colonies is actually very large.  Feel free to going back to selecting 
# since it would happen much less frequently in the new code anyway.
Present <- colonies$Timecount == year
colonies <- colonies[Present,]

# no difference up to size, then it all is
app <- sum(colonies$Age >= 4 & colonies$Age < 10) * 1000 * survival_prob
app2 <- sum(colonies$Age >= 10 & colonies$Age < 15) * 10000 * survival_prob
app3 <- sum(colonies$Age >= 15 & colonies$Age <= 20) * 100000 * survival_prob

size <- app + app2 + app3

#note that ifelse can be used to declare alates as vectors
alates <- ifelse(colonies$Age >= 4 & colonies$Age < 10, 1000, 100000)
alates <- ifelse(colonies$Age >= 10 & colonies$Age < 15, 10000, alates)

# as a consequence, more stuff can be vectorized
indiv <- alates * survival_prob

# we can do some cool stuff with rep to continue vectorizing
# (round when done if you must)
X_temp <- rep(colonies$X, indiv)
Y_temp <- rep(coloines$Y, indiv)

#Initialize two coordinate variables based on the established (or existing) colonies... now as vectors of the entire data frame size
distance <- rexp(size,rate=1/200)
theta <- runif(size, 0, 2*pi)
C <- cos(theta)
S <- sin(theta)
#XY coords (meters) using polar coordinate transformations
X <- X_temp + S * distance
Y <- Y_temp + C * distance
pop <- data.frame(X,Y)  
pop$Sex <- rbinom(size,1,male_prob)
pop$ID <- 1:dim(pop)[1]
# now round... once
pop$X <- round(pop$X,2)
pop$Y <- round(pop$Y,2)
year=2004
生存概率=0.01
公螺纹螺纹=0.5
#在for循环中不执行任何操作,也不保存任何结果
#不到4个。我将把它从殖民地移除,假设它是
#比发布的文件大,并且来自一个您不会更改的文件。我编辑的地方
#你可能想要一份副本。
菌落=4,]
#此代码中只使用当前选择的菌落,因此您也可以停止
#反复选择。。。这个我想你可以复制一份什么的
#就像你真实代码中的代码一样。总的来说,你想要的是在一个
#循环并尽可能少地重复你自己。注意,这可能是内存
#如果菌落实际上非常大,则为密集型。请随时返回选择
#因为它在新代码中发生的频率要低得多。

现在我相信这就是你想要的,漂亮的表达性矢量化R代码。没有循环,甚至没有*apply family或plyr命令。你可以做很多事情使它更灵活,但是使用
rep
的核心矢量化和对随机距离的单个调用是非常关键的。我不知道为什么pop的维度会有一个
if
子句。你需要以不同的方式来处理,因为这不是最终的结果

year = 2004
survival_prob = 0.01
male_prob = 0.5

# you don't do anything in your for loop or save any of the results if the age is 
# less than 4. I'm going to just remove that from colonies on the assumption that it's 
# larger than posted and comes from a file that you won't change.  Where I edit 
# colonies you might want to work with a copy.
colonies <- colonies[colonies$Age >= 4,]

# only Present selection of colonies is ever used in this code so you could also stop 
# repeatedly selecting... this one I'm imagining you might make a copy of, something 
# like coloniesP in your real code.  In general, you want as little going on in a 
# loop and as little repeating yourself as possible.  Note, this might be memory 
# intensive if colonies is actually very large.  Feel free to going back to selecting 
# since it would happen much less frequently in the new code anyway.
Present <- colonies$Timecount == year
colonies <- colonies[Present,]

# no difference up to size, then it all is
app <- sum(colonies$Age >= 4 & colonies$Age < 10) * 1000 * survival_prob
app2 <- sum(colonies$Age >= 10 & colonies$Age < 15) * 10000 * survival_prob
app3 <- sum(colonies$Age >= 15 & colonies$Age <= 20) * 100000 * survival_prob

size <- app + app2 + app3

#note that ifelse can be used to declare alates as vectors
alates <- ifelse(colonies$Age >= 4 & colonies$Age < 10, 1000, 100000)
alates <- ifelse(colonies$Age >= 10 & colonies$Age < 15, 10000, alates)

# as a consequence, more stuff can be vectorized
indiv <- alates * survival_prob

# we can do some cool stuff with rep to continue vectorizing
# (round when done if you must)
X_temp <- rep(colonies$X, indiv)
Y_temp <- rep(coloines$Y, indiv)

#Initialize two coordinate variables based on the established (or existing) colonies... now as vectors of the entire data frame size
distance <- rexp(size,rate=1/200)
theta <- runif(size, 0, 2*pi)
C <- cos(theta)
S <- sin(theta)
#XY coords (meters) using polar coordinate transformations
X <- X_temp + S * distance
Y <- Y_temp + C * distance
pop <- data.frame(X,Y)  
pop$Sex <- rbinom(size,1,male_prob)
pop$ID <- 1:dim(pop)[1]
# now round... once
pop$X <- round(pop$X,2)
pop$Y <- round(pop$Y,2)
year=2004
生存概率=0.01
公螺纹螺纹=0.5
#在for循环中不执行任何操作,也不保存任何结果
#不到4个。我将把它从殖民地移除,假设它是
#比发布的文件大,并且来自一个您不会更改的文件。我编辑的地方
#你可能想要一份副本。
菌落=4,]
#此代码中只使用当前选择的菌落,因此您也可以停止
#反复选择。。。这个我想你可以复制一份什么的
#就像你真实代码中的代码一样。总的来说,你想要的是在一个
#循环并尽可能少地重复你自己。注意,这可能是内存
#如果菌落实际上非常大,则为密集型。请随时返回选择
#因为它在新代码中发生的频率要低得多。

目前的代码似乎有问题。。。你真的不想对4岁以下的孩子做任何事吗?如果是这样的话,立即扔掉它。在我看来,这一切都可以矢量化。请对它进行更好的评论,也许可以更好地描述您想要完成的任务。代码似乎有问题。。。你真的不想对4岁以下的孩子做任何事吗?如果是这样的话,立即扔掉它。在我看来,这一切都可以矢量化。请对它进行更好的评论,也许可以更好地描述你想要实现的目标。谢谢John!!lapply版本是我已经测试过的,但是当我将所有列表元素堆叠在一起时,我拥有的当前循环需要更长的时间…虽然你在这里编写的解决方案工作得很好…我真的很感激…我是否有机会通过电子邮件与你联系?弗朗西斯科·约翰!!lapply版本是我已经测试过的,但是当我将所有列表元素堆叠在一起时,我拥有的当前循环需要更长的时间…虽然你在这里编写的解决方案工作得很好…我真的很感激…我是否有机会通过电子邮件与你联系?弗朗西斯科