在R中自动计算_R_Statistics - Fatal编程技术网

在R中自动计算

r statistics

在R中自动计算,r,statistics,R,Statistics,我有一个大文件，我需要计算不同记录的时间差。为便于说明，提供了MWE 数据帧df： st time from to type size flg fid src dst no ID + 0.163944 2 1 a 40 ------- 1 2.4 5.4 0 10 + 0.215400 2 1

我有一个大文件，我需要计算不同记录的时间差。为便于说明，提供了MWE

数据帧df：

       st   time     from  to   type   size flg         fid     src       dst  no   ID
        + 0.163944    2    1      a     40  -------      1      2.4      5.4   0    10
        + 0.215400    2    1      a     40  -------      1      2.4      5.4   1    28
        + 0.239528    2    1      t     40  -------      1      2.4      5.4   0    37
        + 0.287784    2    1      t   1040  -------      1      2.4      5.4   1    62
        + 0.287784    2    1      t   1040  -------      1      2.4      5.4   2    63
        ..........    .  .      ...   .. .......      .       .        ..   .    ..
        # here should be some more lines with different value such as
        - 0.487784    3    0      t  1040 -------        4      2.8      7.4   2    23
        # the above line will be filtered out by the conditions-just ignore it
        ..........    .  .      ...   .. .......      .       .        ..   .    .. 
        r 0.188072    0    5      a    40 -------      1         2.4      5.4   0    10
        r 0.239528    0    5      a    40 -------      1         2.4      5.4   1    28
        r 0.263656    0    5      t    40 -------      1         2.4      5.4   0    37
        r 0.317128    0    5      t  1040 -------      1         2.4      5.4   1    62
        r 0.318792    0    5      t  1040 -------      1         2.4      5.4   2    63

条件1：对于每个以“+”开头的记录，“ID”将是唯一的。将“src”、“dst”和“from”添加到条件中。基于此信息，“时间”字段将被记录为数组中的开始（即数组[ID]=时间）

条件2：对于每个以“r”开头的记录，将检查“ID”。根据此信息，所需的时差为：当前“时间”-数组[ID]

我已经创建了R代码，并且成功了。但是，我使用的是固定的src和dst值。src的格式为：x.y，其中x始终=2，y正在变化（即y=0,1,2,3,4，…）。此外，dst:z.f，其中z和f在变化（即可能是4.3,5.2,6.100…）

R代码：

src<-"2.4"  # this value should be automated like 2.y. Any suggestions !!! 
dst<-"5.4"  # this value should be automated like z.f
ReqTime<-0
timeHolder<-c()

#start
start<-df[df[, "st"] == "+" &  
        df[, "from"] == 2 &  
        # the src and dst should be automated 
        df[, "src"] == src &        
        df[, "dst"] == dst,]

timeHolder[start$ID]<-start$time

 #end
 end<-df[df[, "st"] == "r" &  
          df[, "from"] == 0 &
          df[, "src"] == src &
          df[, "dst"] == dst,]


if(!is.null(timeHolder[end$ID])){
  ReqTime<- end$time- timeHolder[end$pktID]

 }

cat("Time from ",src,"--",dst,": ",ReqTime,"\n")

如果我能获得如下输出，我将不胜感激：

Time from  2.4 -- 5.4 :  mean( 0.024128 0.024128 0.024128 0.029344 0.031008) which is =0.0265472

如果我正确理解了您想要的内容，您可以

聚合

您的数据：

#your data plus some extra
DF <- read.table(text = 'st   time     from  to   type   size flg         fid     src       dst  no   ID
    + 0.163944    2    1      a     40  -------      1      2.4      5.4   0    10
    + 0.215400    2    1      a     40  -------      1      2.4      5.4   1    28
    + 0.239528    2    1      t     40  -------      1      2.4      5.4   0    37
    + 0.287784    2    1      t   1040  -------      1      2.4      5.4   1    62
    + 0.287784    2    1      t   1040  -------      1      2.4      5.4   2    63
    + 0.297784    2    1      t   1040  -------      1      2.5      5.7   2    65
    + 0.307984    2    1      t   1040  -------      1      2.5      5.7   2    67
    + 0.325784    2    1      t   1040  -------      1      2.5      5.7   2    68
    #..........    .  .      ...   .. .......      .       .        ..   .    ..
    # here should be some more lines with different value such as
    #- 0.487784    3    0      t  1040 -------        4      2.8      7.4   2    23
    # the above line will be filtered out by the conditions-just ignore it
    #..........    .  .      ...   .. .......      .       .        ..   .    .. 
    r 0.188072    0    5      a    40 -------      1         2.4      5.4   0    10
    r 0.239528    0    5      a    40 -------      1         2.4      5.4   1    28
    r 0.263656    0    5      t    40 -------      1         2.4      5.4   0    37
    r 0.317128    0    5      t  1040 -------      1         2.4      5.4   1    62
    r 0.318792    0    5      t  1040 -------      1         2.4      5.4   2    63 
    r 0.328792    0    5      t  1040 -------      1         2.5      5.7   2    65
    r 0.338792    0    5      t  1040 -------      1         2.5      5.7   2    67
    r 0.348792    0    5      t  1040 -------      1         2.5      5.7   2    68',
    header = T, stringsAsFactors = F)

aggregate(DF$time, list(src = DF$src, dst = DF$dst, ID = DF$ID), diff)
#  src dst ID        x
#1 2.4 5.4 10 0.024128
#2 2.4 5.4 28 0.024128
#3 2.4 5.4 37 0.024128
#4 2.4 5.4 62 0.029344
#5 2.4 5.4 63 0.031008
#6 2.5 5.7 65 0.031008
#7 2.5 5.7 67 0.030808
#8 2.5 5.7 68 0.023008

你的方法接近了。但是，DF$st可以是DF中的（“+”、“-”、“r”），如何将其作为条件包含？另外，如果我想添加更多条件，比如DF$size==1040，除非继续重新聚合，否则不能确定它是否适用于聚合。这就是为什么我把条件分开放在变量中（即开始和结束）。对于未来的要求和修改，聚合可能不灵活！！以前没怎么用过骨料！！为了检查结果，我试着用你的方法来做。我做了第一次聚合，有更多的条件。然后，我向聚合数据添加了更多过滤器。然后，我尝试重新聚合，将其作为列表或计算平均值。我得到了“参数意味着不同的行数错误”。在上面的示例数据中，它是有效的（我想这是因为每个“+”都有“r”，但大文件不起作用。请参阅我对您答案的更新。这就是为什么我试图避免使用聚合并使用上面的逻辑。@SimpleEasy:有关进一步的条件，您可以-例如-

聚合（DF$time，list（src=DF$src，dst=DF$dst，ID=DF$ID，size=DF$size），差异）

然后使用-比方说-

size==1040

对那些

聚合的数据帧进行子集划分。或者你可以先对数据进行子集划分，然后对子集执行聚合操作。例如subsetDF@simplenesy:I，刚刚注意到你在我的回答中建议的编辑。对于平均值，你可以使用聚合（aggDF$x，列出（src=aggDF$src，dst=aggDF$dst），平均值）
在我命名为aggDF
的d数据帧上。
#your data plus some extra
DF <- read.table(text = 'st   time     from  to   type   size flg         fid     src       dst  no   ID
    + 0.163944    2    1      a     40  -------      1      2.4      5.4   0    10
    + 0.215400    2    1      a     40  -------      1      2.4      5.4   1    28
    + 0.239528    2    1      t     40  -------      1      2.4      5.4   0    37
    + 0.287784    2    1      t   1040  -------      1      2.4      5.4   1    62
    + 0.287784    2    1      t   1040  -------      1      2.4      5.4   2    63
    + 0.297784    2    1      t   1040  -------      1      2.5      5.7   2    65
    + 0.307984    2    1      t   1040  -------      1      2.5      5.7   2    67
    + 0.325784    2    1      t   1040  -------      1      2.5      5.7   2    68
    #..........    .  .      ...   .. .......      .       .        ..   .    ..
    # here should be some more lines with different value such as
    #- 0.487784    3    0      t  1040 -------        4      2.8      7.4   2    23
    # the above line will be filtered out by the conditions-just ignore it
    #..........    .  .      ...   .. .......      .       .        ..   .    .. 
    r 0.188072    0    5      a    40 -------      1         2.4      5.4   0    10
    r 0.239528    0    5      a    40 -------      1         2.4      5.4   1    28
    r 0.263656    0    5      t    40 -------      1         2.4      5.4   0    37
    r 0.317128    0    5      t  1040 -------      1         2.4      5.4   1    62
    r 0.318792    0    5      t  1040 -------      1         2.4      5.4   2    63 
    r 0.328792    0    5      t  1040 -------      1         2.5      5.7   2    65
    r 0.338792    0    5      t  1040 -------      1         2.5      5.7   2    67
    r 0.348792    0    5      t  1040 -------      1         2.5      5.7   2    68',
    header = T, stringsAsFactors = F)

aggregate(DF$time, list(src = DF$src, dst = DF$dst, ID = DF$ID), diff)
#  src dst ID        x
#1 2.4 5.4 10 0.024128
#2 2.4 5.4 28 0.024128
#3 2.4 5.4 37 0.024128
#4 2.4 5.4 62 0.029344
#5 2.4 5.4 63 0.031008
#6 2.5 5.7 65 0.031008
#7 2.5 5.7 67 0.030808
#8 2.5 5.7 68 0.023008

aggDF <- aggregate(DF$time, list(src = DF$src, dst = DF$dst, ID = DF$ID), diff)

aggregate(aggDF$x, list(src = aggDF$src, dst = aggDF$dst), list)
#  src dst                                                x
#1 2.4 5.4 0.024128, 0.024128, 0.024128, 0.029344, 0.031008
#2 2.5 5.7                     0.031008, 0.030808, 0.023008