Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/sharepoint/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使R foreach循环有效_R_For Loop_Foreach_Vectorization - Fatal编程技术网

如何使R foreach循环有效

如何使R foreach循环有效,r,for-loop,foreach,vectorization,R,For Loop,Foreach,Vectorization,我试图在R中计算一个30000x300000矩阵,我的代码运行得很好,但它已经运行了好几天了,我怎样才能使它更高效、更省时 我的代码运行良好,但它已经运行了几天了,附件是我正在使用的代码的一个子集,ID扩展到300000;如何使代码在几分钟内运行得更快,因为它已经运行了几天了 fam <- structure(list(ID = c(1L, 2L, 3L, 4L, 6L, 5L, 7L), dad = c(0L, 0L, 1L, 1L, 1L, 3L, 5L), mum = c(0L,

我试图在R中计算一个30000x300000矩阵,我的代码运行得很好,但它已经运行了好几天了,我怎样才能使它更高效、更省时

我的代码运行良好,但它已经运行了几天了,附件是我正在使用的代码的一个子集,ID扩展到300000;如何使代码在几分钟内运行得更快,因为它已经运行了几天了

fam <- structure(list(ID = c(1L, 2L, 3L, 4L, 6L, 5L, 7L), dad = c(0L, 
0L, 1L, 1L, 1L, 3L, 5L), mum = c(0L, 0L, 0L, 2L, 4L, 4L, 6L), 
    GEN = c(1L, 1L, 2L, 2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, 
-7L))
问题是如何让它在300k x 300k的矩阵中更快地运行,这需要几天或几周的时间才能运行,因为我已经运行了一段时间了,我可以做些什么使它运行得更快


注意:将示例保存为“anything.txt”,然后以“fam的形式读取文件。您遇到的问题是,这是递归的。每个循环取决于前一个循环的结果。因此,您不能真正使用矢量化来解决问题

如果你想用R来做这件事,最好的办法是研究
Rcpp
。我对
Rcpp
不太在行,但我有一些建议

最简单的方法是去掉
foreach
循环,用常规的
for
循环代替它。使用并行线程会有很多开销,当函数是递归函数时,工作人员很难自己做得更好

# Before

foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%
{ ... }

# After
for (j in 1:(t-1)) {
...
}
下一步要做的是考虑你是否真的需要一个稀疏矩阵。如果你没有内存问题,你最好使用一个规则矩阵

A<-Matrix(0,nrow=n,ncol=n, sparse=TRUE)
# to
A<-matrix(0,nrow=n,ncol=n)
这一点很重要,因为这允许我们提前跳过。您的原始代码片段有1000行,其中0表示“妈妈”和“爸爸”。通过此初始化,我们可以直接跳过第一行,而“妈妈”或“爸爸”的结果为非零:

  t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
  t_end <- max(fam[['ID']])

  for (t in t_start:t_end) {
...
}
性能

Unit: microseconds
                expr       min         lq      mean    median        uq     max neval
            original 85759.901 86650.7515 88776.695 88740.050 90529.750 97433.2   100
         non_foreach 47912.601 48528.5010 50699.867 50220.901 51782.651 88355.1   100
 non_sparse_non_each  1423.701  1454.3015  1531.833  1471.451  1496.401  4126.3   100
        final_change   953.102   981.8015  1212.264  1010.500  1026.052 21350.1   100
fam <- structure(list(ID = c(1L, 2L, 3L, 4L, 6L, 5L, 7L), dad = c(0L, 
                                                                  0L, 1L, 1L, 1L, 3L, 5L), mum = c(0L, 0L, 0L, 2L, 4L, 4L, 6L), 
                      GEN = c(1L, 1L, 2L, 2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, 
                                                                                                -7L))
A<-matrix(0,nrow=7,ncol=7)
diag(A) <- 2-0.5^(fam[["GEN"]]-1)

t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
t_end <- max(fam[['ID']])

for (t in t_start:t_end) {
  A[t,t]<- sum(c(A[t,t], 0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))

  for(j in 1:(t-1))  {
    A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
    A[j,t]<- A[t,j]
  }
}
A

hom<-function(data) { 
  library(Matrix)
  library(foreach)
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-Matrix(0,nrow=n,ncol=n, sparse=TRUE)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%

      { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%  
      { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

hom2<-function(data) { 
  library(Matrix)
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-Matrix(0,nrow=n,ncol=n, sparse = T)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      for (j in 1:(t-1)) { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      for (j in 1:(t-1)) { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

hom3<-function(data) { 
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-matrix(0,nrow=n,ncol=n)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      for (j in 1:(t-1)) { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      for (j in 1:(t-1)) { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

library(microbenchmark)
f_changed = function(fam) {
  t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
  t_end <- max(fam[['ID']])

  A<-matrix(0,nrow=t_end,ncol=t_end)
  diag(A) <- 2-0.5^(fam[["GEN"]]-1)

  for (t in t_start:t_end) {
    A[t,t]<- sum(c(A[t,t], 0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))

    for(j in 1:(t-1))  {
      A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
      A[j,t]<- A[t,j]
    }
  }
  A
}
microbenchmark(
  original = {
    hom(fam)
  }
  , non_foreach = {
    hom2(fam)
  }
  , non_sparse_non_each = {
    hom3(fam)
  }
  , final_change = {
  f_changed(fam)
  }
,times = 100
)
所有代码

Unit: microseconds
                expr       min         lq      mean    median        uq     max neval
            original 85759.901 86650.7515 88776.695 88740.050 90529.750 97433.2   100
         non_foreach 47912.601 48528.5010 50699.867 50220.901 51782.651 88355.1   100
 non_sparse_non_each  1423.701  1454.3015  1531.833  1471.451  1496.401  4126.3   100
        final_change   953.102   981.8015  1212.264  1010.500  1026.052 21350.1   100
fam <- structure(list(ID = c(1L, 2L, 3L, 4L, 6L, 5L, 7L), dad = c(0L, 
                                                                  0L, 1L, 1L, 1L, 3L, 5L), mum = c(0L, 0L, 0L, 2L, 4L, 4L, 6L), 
                      GEN = c(1L, 1L, 2L, 2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, 
                                                                                                -7L))
A<-matrix(0,nrow=7,ncol=7)
diag(A) <- 2-0.5^(fam[["GEN"]]-1)

t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
t_end <- max(fam[['ID']])

for (t in t_start:t_end) {
  A[t,t]<- sum(c(A[t,t], 0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))

  for(j in 1:(t-1))  {
    A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
    A[j,t]<- A[t,j]
  }
}
A

hom<-function(data) { 
  library(Matrix)
  library(foreach)
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-Matrix(0,nrow=n,ncol=n, sparse=TRUE)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%

      { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%  
      { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

hom2<-function(data) { 
  library(Matrix)
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-Matrix(0,nrow=n,ncol=n, sparse = T)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      for (j in 1:(t-1)) { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      for (j in 1:(t-1)) { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

hom3<-function(data) { 
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-matrix(0,nrow=n,ncol=n)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      for (j in 1:(t-1)) { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      for (j in 1:(t-1)) { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

library(microbenchmark)
f_changed = function(fam) {
  t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
  t_end <- max(fam[['ID']])

  A<-matrix(0,nrow=t_end,ncol=t_end)
  diag(A) <- 2-0.5^(fam[["GEN"]]-1)

  for (t in t_start:t_end) {
    A[t,t]<- sum(c(A[t,t], 0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))

    for(j in 1:(t-1))  {
      A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
      A[j,t]<- A[t,j]
    }
  }
  A
}
microbenchmark(
  original = {
    hom(fam)
  }
  , non_foreach = {
    hom2(fam)
  }
  , non_sparse_non_each = {
    hom3(fam)
  }
  , final_change = {
  f_changed(fam)
  }
,times = 100
)

fam您的问题是这是递归的。每个循环都取决于前一个循环的结果。因此,您不能真正使用矢量化来解决问题

如果你想用R来做这件事,最好的办法是研究
Rcpp
。我对
Rcpp
不太在行,但我有一些建议

最简单的方法是去掉
foreach
循环,用常规的
for
循环代替它。使用并行线程会有很多开销,当函数是递归函数时,工作人员很难自己做得更好

# Before

foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%
{ ... }

# After
for (j in 1:(t-1)) {
...
}
下一步要做的是考虑你是否真的需要一个稀疏矩阵。如果你没有内存问题,你最好使用一个规则矩阵

A<-Matrix(0,nrow=n,ncol=n, sparse=TRUE)
# to
A<-matrix(0,nrow=n,ncol=n)
这一点很重要,因为这允许我们提前跳过。您的原始代码片段有1000行,其中0表示“妈妈”和“爸爸”。通过此初始化,我们可以直接跳过第一行,而“妈妈”或“爸爸”的结果为非零:

  t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
  t_end <- max(fam[['ID']])

  for (t in t_start:t_end) {
...
}
性能

Unit: microseconds
                expr       min         lq      mean    median        uq     max neval
            original 85759.901 86650.7515 88776.695 88740.050 90529.750 97433.2   100
         non_foreach 47912.601 48528.5010 50699.867 50220.901 51782.651 88355.1   100
 non_sparse_non_each  1423.701  1454.3015  1531.833  1471.451  1496.401  4126.3   100
        final_change   953.102   981.8015  1212.264  1010.500  1026.052 21350.1   100
fam <- structure(list(ID = c(1L, 2L, 3L, 4L, 6L, 5L, 7L), dad = c(0L, 
                                                                  0L, 1L, 1L, 1L, 3L, 5L), mum = c(0L, 0L, 0L, 2L, 4L, 4L, 6L), 
                      GEN = c(1L, 1L, 2L, 2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, 
                                                                                                -7L))
A<-matrix(0,nrow=7,ncol=7)
diag(A) <- 2-0.5^(fam[["GEN"]]-1)

t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
t_end <- max(fam[['ID']])

for (t in t_start:t_end) {
  A[t,t]<- sum(c(A[t,t], 0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))

  for(j in 1:(t-1))  {
    A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
    A[j,t]<- A[t,j]
  }
}
A

hom<-function(data) { 
  library(Matrix)
  library(foreach)
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-Matrix(0,nrow=n,ncol=n, sparse=TRUE)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%

      { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%  
      { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

hom2<-function(data) { 
  library(Matrix)
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-Matrix(0,nrow=n,ncol=n, sparse = T)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      for (j in 1:(t-1)) { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      for (j in 1:(t-1)) { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

hom3<-function(data) { 
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-matrix(0,nrow=n,ncol=n)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      for (j in 1:(t-1)) { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      for (j in 1:(t-1)) { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

library(microbenchmark)
f_changed = function(fam) {
  t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
  t_end <- max(fam[['ID']])

  A<-matrix(0,nrow=t_end,ncol=t_end)
  diag(A) <- 2-0.5^(fam[["GEN"]]-1)

  for (t in t_start:t_end) {
    A[t,t]<- sum(c(A[t,t], 0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))

    for(j in 1:(t-1))  {
      A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
      A[j,t]<- A[t,j]
    }
  }
  A
}
microbenchmark(
  original = {
    hom(fam)
  }
  , non_foreach = {
    hom2(fam)
  }
  , non_sparse_non_each = {
    hom3(fam)
  }
  , final_change = {
  f_changed(fam)
  }
,times = 100
)
所有代码

Unit: microseconds
                expr       min         lq      mean    median        uq     max neval
            original 85759.901 86650.7515 88776.695 88740.050 90529.750 97433.2   100
         non_foreach 47912.601 48528.5010 50699.867 50220.901 51782.651 88355.1   100
 non_sparse_non_each  1423.701  1454.3015  1531.833  1471.451  1496.401  4126.3   100
        final_change   953.102   981.8015  1212.264  1010.500  1026.052 21350.1   100
fam <- structure(list(ID = c(1L, 2L, 3L, 4L, 6L, 5L, 7L), dad = c(0L, 
                                                                  0L, 1L, 1L, 1L, 3L, 5L), mum = c(0L, 0L, 0L, 2L, 4L, 4L, 6L), 
                      GEN = c(1L, 1L, 2L, 2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, 
                                                                                                -7L))
A<-matrix(0,nrow=7,ncol=7)
diag(A) <- 2-0.5^(fam[["GEN"]]-1)

t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
t_end <- max(fam[['ID']])

for (t in t_start:t_end) {
  A[t,t]<- sum(c(A[t,t], 0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))

  for(j in 1:(t-1))  {
    A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
    A[j,t]<- A[t,j]
  }
}
A

hom<-function(data) { 
  library(Matrix)
  library(foreach)
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-Matrix(0,nrow=n,ncol=n, sparse=TRUE)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%

      { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      foreach(j = 1:(t-1),  .combine='c', .packages=c("Matrix", "foreach")) %do%  
      { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

hom2<-function(data) { 
  library(Matrix)
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-Matrix(0,nrow=n,ncol=n, sparse = T)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      for (j in 1:(t-1)) { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      for (j in 1:(t-1)) { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

hom3<-function(data) { 
  n<-max(as.numeric(fam[,"ID"])) 
  t<-min(as.numeric(fam[,"ID"])) 
  A<-matrix(0,nrow=n,ncol=n)

  while(t <=n) {

    s<-max(fam[t,"dad"],fam[t,"mum"]) 
    d<-min(fam[t,"dad"],fam[t,"mum"])
    if (s>0 & d>0 ) 
    { 
      if (fam[t,"GEN"]==999 & s!=d) 
      { warning("both dad and mum should be the same, different for at least       one individual")
        NULL    
      }

      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)+0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]
      for (j in 1:(t-1)) { 
        A[t,j]<- 0.5*(A[j,fam[t,"dad"]]+A[j,fam[t,"mum"]])
        A[j,t]<- A[t,j] 
      } 
    } 
    if (s>0 & d==0 )
    { 
      if ( fam[t,"GEN"]==999) 
      { warning("both dad and mum should be the same, one parent equal to zero for at least individual")
        NULL }
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1) 
      for (j in 1:(t-1)) { 
        A[t,j]<-0.5*A[j,s]
        A[j,t]<-A[t,j] 
      } 
    } 
    if (s==0 )
    {
      A[t,t]<- 2-0.5^(fam[t,"GEN"]-1)
    }

    # cat(" MatbyGEN: ", t ,"\n") 
    t <- t+1


  } 

  A

}

library(microbenchmark)
f_changed = function(fam) {
  t_start <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
  t_end <- max(fam[['ID']])

  A<-matrix(0,nrow=t_end,ncol=t_end)
  diag(A) <- 2-0.5^(fam[["GEN"]]-1)

  for (t in t_start:t_end) {
    A[t,t]<- sum(c(A[t,t], 0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))

    for(j in 1:(t-1))  {
      A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
      A[j,t]<- A[t,j]
    }
  }
  A
}
microbenchmark(
  original = {
    hom(fam)
  }
  , non_foreach = {
    hom2(fam)
  }
  , non_sparse_non_each = {
    hom3(fam)
  }
  , final_change = {
  f_changed(fam)
  }
,times = 100
)


fam与其寻求帮助调试和优化您的代码,为什么不解释一下您对矩阵的期望结果是什么。请参考上面Adam的评论。如果您想要帮助解释您的算法和期望结果。不太可能有人会花精力试着理解您的代码。很可能不会“你的代码需要从头重写才能达到效率。@维克托,我试过看这个,但是制作一个4000 X 4000的矩阵太多了。另外,我在复制和粘贴代码时出错。你应该以一个10 X 10的矩阵为目标,制作一个数据。如果你疯了,甚至可能是30 X 30。我说用一些东西替换你当前的例子。”只有15条记录。甚至100条。谢谢你提供了一个较小的数字。不过,我在
foreach(j=1:(t-1)中得到了一个错误…
行。
%:%被传递了一个非法的右操作数
。我将其更改为
%do%
,它可以工作。与其请求帮助调试和优化代码,不如解释一下矩阵的预期结果。请参阅上面Adam的注释。如果您需要帮助解释算法和预期结果,请不太可能有人会花费精力去尝试和理解你的代码在做什么。很可能你的代码需要从头重写才能达到效率。@维克托,我试过看这个,但是制作一个4000 X 4000的矩阵太多了。另外,我在复制和粘贴你的代码时出错了。你应该以一个10 X 10的矩阵为目标编一个数据。如果你疯了,甚至可能是30 x 30。我是说用只有15条记录的东西替换你当前的例子。或者甚至是100条。谢谢你提供了一个较小的数字。不过,我在
foreach(j=1:(t-1)中得到了一个错误…
行。
%:%被传递了一个非法的右操作数
。我将其更改为
%do%
,它可以工作。感谢您迄今为止的帮助@Cole,我已经开始在更大的数据上运行它,我会告诉您它需要多少分钟/小时感谢您的帮助,但我在尝试使用更大的数据时遇到了这个错误:
诊断中的错误hat代码看起来不熟悉。我的代码是
diag(A)
diag中出错我无能为力。你是否将
A
矩阵调整为7x7以外的矩阵?fam的nrow与A的维度是什么?感谢你迄今为止的帮助@Cole,我已经开始在更大的数据上运行它,我会告诉你需要多少分钟/小时感谢你的帮助,但我遇到了这个error当我尝试使用更大的数据时:
diag中的错误该代码看起来并不熟悉。我的是
diag中的
diag(a)错误我无能为力。你是否将
a
矩阵调整为7x7矩阵以外的矩阵?fam的nrow与a的维度相比是多少?