R ddply内存需求、解决方案_R_Memory_Count_Plyr

R ddply内存需求、解决方案

r memory

R ddply内存需求、解决方案,r,memory,count,plyr,R,Memory,Count,Plyr,我有一个列，在该列中，我有大约5-10个实例，每个实例包含几十万个不同的字符串。我想数一数，然后把数放到相应的行中。因此，我： newdf <- ddply(oldDF, ~BigVariable, transform, counts = length(BigVariable)) newdfdplyr是ddply的更好选择，因为它可以更高效 library(dplyr) oldDF %>% group_by(BigVariable) %>% mutate

我有一个列，在该列中，我有大约5-10个实例，每个实例包含几十万个不同的字符串。我想数一数，然后把数放到相应的行中。因此，我：

newdf <-  ddply(oldDF, ~BigVariable, transform, counts = length(BigVariable))

newdfdplyr
是ddply
的更好选择，因为它可以更高效
library(dplyr)
oldDF %>%
     group_by(BigVariable) %>%
     mutate(counts = n()) 


或使用数据。表格

library(data.table)
setDT(oldDF)[, counts := .N, by = BigVariable]

第一个解决方案没有崩溃，但给了我一个空值。第二种方法似乎效果良好。现在，看看我能把它推到多大的数据集。谢谢@JamesHanks空输出令人困惑。你有没有在新的课程中尝试过。