我如何重复代码，每个街区的名称都会改变？（附R）_R_Loops_Repeat_Data Manipulation_Dry

我如何重复代码，每个街区的名称都会改变？（附R）

r loops

我如何重复代码，每个街区的名称都会改变？（附R）,r,loops,repeat,data-manipulation,dry,R,Loops,Repeat,Data Manipulation,Dry,我正在处理从QIIME获得的几个输出，这些文本是我想要处理以获得箱线图的。每个输入都以相同的方式格式化，因此操作总是相同的，但它会更改源名称。对于每个输入，我希望提取最后5行，对每个列/样本进行平均，将值与从mapfile中获取的样本实验标签（组）相关联，并将它们按我用于生成所有6个数据的箱线图的顺序排列在bash中，我执行类似“for I In GG97 GG100 SILVA97 SILVA100 NCBI RDP；do cp${I}/alpha/collated_alpha/chao1.

我正在处理从QIIME获得的几个输出，这些文本是我想要处理以获得箱线图的。每个输入都以相同的方式格式化，因此操作总是相同的，但它会更改源名称。对于每个输入，我希望提取最后5行，对每个列/样本进行平均，将值与从mapfile中获取的样本实验标签（组）相关联，并将它们按我用于生成所有6个数据的箱线图的顺序排列

在bash中，我执行类似“

for I In GG97 GG100 SILVA97 SILVA100 NCBI RDP；do cp${I}/alpha/collated_alpha/chao1.txt alpha_tot/${I}\u chao1.txt；done

”这样的操作，通过

${I}

自动多次更改代码中的名称

我正在努力找到一种方法来处理R。我想创建一个包含名称的向量，然后通过移动

和

[1]、[2]

等来使用

for

循环，但它不起作用，它停在read.delim行，在wd中找不到文件

这是我写的操作代码。在评论之后，它将在我使用的6个数据库（GG97 GG100 SILVA97 SILVA100 NCBI RDP）中重复6次

另外，我重复这个过程4次，因为我有4个指标要使用（这里我展示了shannon，但我也有一份关于chao1、观察到的物种和整个树的代码副本）

库（tidyverse）
图书馆（贴有标签）
mapfile您没有提供答案，因此此答案不能保证正确性
需要注意的一点是，您使用的是rm（…）
，因此这意味着某些变量仅在特定范围内相关。因此，将此范围封装到函数中。这将使您的代码可重用，并避免手动删除变量：
process <- function(file, DB){
  # -> Use the function parameter `file` instead of a hardcoded filename
  collated <- read.delim(file=file, check.names=FALSE);  
  collated <- tail(collated,5); collated <- collated[,-c(1:3)]
  collated_reorder <- collated[,match(mapfile[,1], colnames(collated))]

  labels <- t(mapfile)
  colnames(collated_reorder) <- labels[2,]

  mean <- colMeans(collated_reorder, na.rm = FALSE, dims = 1)
  mean = as.matrix(mean); mean <- t(mean)

  # -> rename this variable to a more general name, e.g. `result`
  result <- as.data.frame(rbind(labels[2,],mean))
  result <- t(result); 

  # -> Use the function parameter `DB` instead of a hardcoded string
  DB_type <- list(DB = DB); DB_type <- rep(DB_type, 41)
  result <- as.data.frame(cbind(DB_type,result))
  colnames(result) <- c("DB","Group","value")

  # -> After the end of this function, the variables defined in this function
  #    vanish automatically, you just need to specify the result
  return(result)
}


mapply
，一种用于在多个向量上并行循环的“专用”：
# the first argument is the function from above, the other ones are given as arguments
# to our process(.) function
results <- mapply(process, files, DBs)

#第一个参数是上面的函数，其他参数作为参数给出
#我们的过程（.）功能
结果您没有提供答案，因此此答案不能保证正确性
需要注意的一点是，您使用的是rm（…）
，因此这意味着某些变量仅在特定范围内相关。因此，将此范围封装到函数中。这将使您的代码可重用，并避免手动删除变量：
process <- function(file, DB){
  # -> Use the function parameter `file` instead of a hardcoded filename
  collated <- read.delim(file=file, check.names=FALSE);  
  collated <- tail(collated,5); collated <- collated[,-c(1:3)]
  collated_reorder <- collated[,match(mapfile[,1], colnames(collated))]

  labels <- t(mapfile)
  colnames(collated_reorder) <- labels[2,]

  mean <- colMeans(collated_reorder, na.rm = FALSE, dims = 1)
  mean = as.matrix(mean); mean <- t(mean)

  # -> rename this variable to a more general name, e.g. `result`
  result <- as.data.frame(rbind(labels[2,],mean))
  result <- t(result); 

  # -> Use the function parameter `DB` instead of a hardcoded string
  DB_type <- list(DB = DB); DB_type <- rep(DB_type, 41)
  result <- as.data.frame(cbind(DB_type,result))
  colnames(result) <- c("DB","Group","value")

  # -> After the end of this function, the variables defined in this function
  #    vanish automatically, you just need to specify the result
  return(result)
}


mapply
，一种用于在多个向量上并行循环的“专用”：
# the first argument is the function from above, the other ones are given as arguments
# to our process(.) function
results <- mapply(process, files, DBs)

#第一个参数是上面的函数，其他参数作为参数给出
#我们的过程（.）功能
结果成功了！我不太了解函数，现在我开始了解它们是如何工作的。。。非常感谢你！也许你也可以告诉我为什么在使用过程结果时会出现错误“x必须是原子的”？这也发生在我的原始代码中，但我发现它并没有随着函数的实现而改变。在到达那里的过程中，我做错了什么？在R中，原子对象只能保存相同类型的元素（例如在c（1，2，3，“A”）
中，数字转换为字符串->原子，但在列表（1，2，3，“A”）
中，不会发生->非原子===递归）。我不知道您要做什么，但它需要一个原子对象，但您提供了一个递归对象（例如，list
，data.frame
）。也许您想使用单括号x[col]
来选择data.frame列，但您需要双括号x[[col]]
或x$col
。这样做很有效！我不太了解函数，现在我开始了解它们是如何工作的。。。非常感谢你！也许你也可以告诉我为什么在使用过程结果时会出现错误“x必须是原子的”？这也发生在我的原始代码中，但我发现它并没有随着函数的实现而改变。在到达那里的过程中，我做错了什么？在R中，原子对象只能保存相同类型的元素（例如在c（1，2，3，“A”）
中，数字转换为字符串->原子，但在列表（1，2，3，“A”）
中，不会发生->非原子===递归）。我不知道您要做什么，但它需要一个原子对象，但您提供了一个递归对象（例如，list
，data.frame
）。可能您想使用单括号x[col]
选择data.frame列，但需要双括号x[[col]]
或x$col。
datasets <-  c("GG97_shannon", "GG100_shannon", "SILVA97_shannon", 
               "SILVA100_shannon", "NCBI_shannon", "RDP_shannon")
files    <-  c("alpha_diversity/GG97_shannon.txt", .....)
DBs      <-  c("GG97", ....)
result   <-  list()

for(i in seq_along(datasets)){
   result[[datasets[i]]] <- process(files[i], DBs[i])
}

# the first argument is the function from above, the other ones are given as arguments
# to our process(.) function
results <- mapply(process, files, DBs)