Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
大型数据集R中数据帧子集的排序_R_Sorting_Dataframe_Sequence_Ranking - Fatal编程技术网

大型数据集R中数据帧子集的排序

大型数据集R中数据帧子集的排序,r,sorting,dataframe,sequence,ranking,R,Sorting,Dataframe,Sequence,Ranking,我有以下数据集,我需要跟踪一年中每一天每个用户所在的位置序列 User Date Location Time 90 2013-01-28 39 16:06:20 26 2013-02-04 27 19:32:09 23 2013-02-04 5 16:03:39 23 2013-01-07 29 15:40:25 84

我有以下数据集,我需要跟踪一年中每一天每个用户所在的位置序列

   User    Date        Location     Time
    90   2013-01-28       39      16:06:20
    26   2013-02-04       27      19:32:09
    23   2013-02-04        5      16:03:39
    23   2013-01-07       29      15:40:25
    84   2013-02-27       50      17:25:40
    57   2013-01-30        5      17:26:26
我修改了以下线程中使用的脚本:

修改后的代码如下所示:

data$User <- as.factor(data$User)
data$Date <- as.factor(data$Date)

data$Sequence <- ave(data$Time, data$User, data$Date, FUN=rank) 

data <- data[order(data$Sequence),]
data <- data[order(data$User),]
data <- data[order(data$Date),]
然而,虽然它适用于小数据帧,但在真实数据集(500万行,近10万个用户)上运行需要花费大量的时间


有更有效的方法吗?

对于更大的data.frames,我的经验是
ave
会变得非常慢

您最大的加速可能是切换到
数据。表

# load data.table package
library(data.table)
# convert data.frame into data.table
setDT(data)

# get ranks and sort
data[, Sequence := rank(Time), by=.(User, Date)][order(Sequence, User, Date),]
此软件包针对大数据帧的速度进行了优化。此外,正如您所看到的,它允许您将流程组合到一行中,这非常方便

# load data.table package
library(data.table)
# convert data.frame into data.table
setDT(data)

# get ranks and sort
data[, Sequence := rank(Time), by=.(User, Date)][order(Sequence, User, Date),]