Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R lappy函数中的lappy_R - Fatal编程技术网

R lappy函数中的lappy

R lappy函数中的lappy,r,R,刚刚知道循环是坏的,我现在尝试在lappy中使用lappy。我有一系列按顺序编号的数据帧。在每一列中,我想根据它们的值用字母替换第5列和第8列,以便 `if the value is <2 the value is changed to "l" (for loss)`, `if it equals 2 the value should be "d"` and if >2 it should be "g". 我使用的代码如下 structure(list(Chromosome =

刚刚知道循环是坏的,我现在尝试在lappy中使用lappy。我有一系列按顺序编号的数据帧。在每一列中,我想根据它们的值用字母替换第5列和第8列,以便

`if the value is <2 the value is changed to "l" (for loss)`, 
`if it equals 2 the value should be "d"` 
and if >2 it should be "g".
我使用的代码如下

structure(list(Chromosome = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("1", "10", "11", "12", "13", "14", "15", "16", 
"17", "18", "19", "2", "20", "21", "22", "3", "4", "5", "6", 
"7", "8", "9", "X", "Y"), class = "factor"), Start = c(1L, 100000001L, 
10000001L, 1000001L, 100500001L, 101000001L), Ratio.x = c(1.32971, 
0.990806, 0.991636, 1.01224, 1.00196, 1.00834), MedianRatio.x = c(1.32971, 
1.00378, 0.988738, 0.979015, 1.00378, 1.00378), CopyNumber.x = c(3L, 
2L, 2L, 1L, 2L, 1L), Ratio.y = c(-1, 0.718527, 1.09204, -1, 1.07779, 
1.41024), MedianRatio.y = c(-1, 1.07779, 0.814437, 0.814437, 
1.07779, 1.07779), CopyNumber.y = c(2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("Chromosome", 
"Start", "Ratio.x", "MedianRatio.x", "CopyNumber.x", "Ratio.y", 
"MedianRatio.y", "CopyNumber.y"), row.names = c(NA, 6L), class = "data.frame")
lst <- mget(ls(pattern='total\\d+'))

lapply(lst, function(df) {
  lapply(df, function(x){
  #Mark out diploid as "d"
  x[,5][x[,5] == "2"] <- "d"
  x[,8][x[,8] == "2"] <- "d"
  #Deletions are "l"
  x[,5][x[,5] < 2 & x[,5] !="d"] <- "l"
  x[,8][x[,8] < 2 & x[,8] !="d"] <- "l"
  #Gains are "g"
  x[,5][x[,5] > 2 & x[,5] !="l" & x[,5] !="d"] <- "g"
  x[,8][x[,8] > 2 & x[,8] !="l" & x[,8] !="d"] <- "g"
  #Compare the g's l's and d's
}
)})

内部的
lappy
没有真正的意义-这里也不会使用循环

相反,可以按如下方式替换列:

classify_cnv = function (column)
    ifelse(column < 2, 'l', ifelse(column > 2, 'g', 'd'))
然后,您可以将其放在
lappy
data.frame
s中:

classify_all_cnvs = function (df) {
    df$CopyNumber.x = classify_cnv(df$CopyNumber.x)
    df$CopyNumber.y = classify_cnv(df$CopyNumber.y)
    df
}

result = lapply(lst, classify_all_cnvs)

然而,用一个大的
data.frame
替换
data.frame
的列表实际上可能更合适,用一个额外的列指定数据来自哪个原始表。如何做到最好取决于您的确切数据结构。

这里有一个避免循环(隐藏或其他)的替代方案:

df;
##   Chromosome     Start  Ratio.x MedianRatio.x CopyNumber.x   Ratio.y MedianRatio.y CopyNumber.y
## 1          1         1 1.329710      1.329710            3 -1.000000     -1.000000            2
## 2          1 100000001 0.990806      1.003780            2  0.718527      1.077790            2
## 3          1  10000001 0.991636      0.988738            2  1.092040      0.814437            2
## 4          1   1000001 1.012240      0.979015            1 -1.000000      0.814437            2
## 5          1 100500001 1.001960      1.003780            2  1.077790      1.077790            2
## 6          1 101000001 1.008340      1.003780            1  1.410240      1.077790            2
df[,c(5,8)] <- c('l','d','g')[sign(as.matrix(df[,c(5,8)])-2)+2];
df;
##   Chromosome     Start  Ratio.x MedianRatio.x CopyNumber.x   Ratio.y MedianRatio.y CopyNumber.y
## 1          1         1 1.329710      1.329710            g -1.000000     -1.000000            d
## 2          1 100000001 0.990806      1.003780            d  0.718527      1.077790            d
## 3          1  10000001 0.991636      0.988738            d  1.092040      0.814437            d
## 4          1   1000001 1.012240      0.979015            l -1.000000      0.814437            d
## 5          1 100500001 1.001960      1.003780            d  1.077790      1.077790            d
## 6          1 101000001 1.008340      1.003780            l  1.410240      1.077790            d
df;
##染色体起始比率.x MedianRatio.x CopyNumber.x比率.y MedianRatio.y CopyNumber.y
## 1          1         1 1.329710      1.329710            3 -1.000000     -1.000000            2
## 2          1 100000001 0.990806      1.003780            2  0.718527      1.077790            2
## 3          1  10000001 0.991636      0.988738            2  1.092040      0.814437            2
## 4          1   1000001 1.012240      0.979015            1 -1.000000      0.814437            2
## 5          1 100500001 1.001960      1.003780            2  1.077790      1.077790            2
## 6          1 101000001 1.008340      1.003780            1  1.410240      1.077790            2

df[,c(5,8)]谁教你循环是坏的,谁教你错了。。。他们应该告诉你,按名称引用列比按数字引用列更可靠,或者将整数与字符串(
x[,5]=“2”
)进行比较是非常奇怪的。“那太糟糕了!”约书亚·里希说。我已经修改了字符串问题,但仍然有相同的问题problem@KonradRudolph:我不知道您在本文中所说的“出血范围”是什么意思,但我严重怀疑OP(或其他人)所说的“R中的for循环不好”是什么意思@KonradRudolph,它与循环无关,但与作用域密切相关,作用域的设计目的是以交互方式和编程方式实现数据编程。如果你不高兴,你可以保证退款……这很聪明,而且完全不可理解和调试。请不要这样写代码。
classify_all_cnvs = function (df) {
    df$CopyNumber.x = classify_cnv(df$CopyNumber.x)
    df$CopyNumber.y = classify_cnv(df$CopyNumber.y)
    df
}

result = lapply(lst, classify_all_cnvs)
df;
##   Chromosome     Start  Ratio.x MedianRatio.x CopyNumber.x   Ratio.y MedianRatio.y CopyNumber.y
## 1          1         1 1.329710      1.329710            3 -1.000000     -1.000000            2
## 2          1 100000001 0.990806      1.003780            2  0.718527      1.077790            2
## 3          1  10000001 0.991636      0.988738            2  1.092040      0.814437            2
## 4          1   1000001 1.012240      0.979015            1 -1.000000      0.814437            2
## 5          1 100500001 1.001960      1.003780            2  1.077790      1.077790            2
## 6          1 101000001 1.008340      1.003780            1  1.410240      1.077790            2
df[,c(5,8)] <- c('l','d','g')[sign(as.matrix(df[,c(5,8)])-2)+2];
df;
##   Chromosome     Start  Ratio.x MedianRatio.x CopyNumber.x   Ratio.y MedianRatio.y CopyNumber.y
## 1          1         1 1.329710      1.329710            g -1.000000     -1.000000            d
## 2          1 100000001 0.990806      1.003780            d  0.718527      1.077790            d
## 3          1  10000001 0.991636      0.988738            d  1.092040      0.814437            d
## 4          1   1000001 1.012240      0.979015            l -1.000000      0.814437            d
## 5          1 100500001 1.001960      1.003780            d  1.077790      1.077790            d
## 6          1 101000001 1.008340      1.003780            l  1.410240      1.077790            d