R 将函数递归应用于文件夹中的成对文件
我想比较两个文件中每个文件的一列。文件包含在文件夹中,并按比较顺序列出,例如R 将函数递归应用于文件夹中的成对文件,r,R,我想比较两个文件中每个文件的一列。文件包含在文件夹中,并按比较顺序列出,例如 File_1a File_1b File_2a File_2b File_3a File_3b 我想执行一个函数,比较两个文件中每个文件的一列,然后输出一个数字。我比较什么其实并不重要,因为我向你保证代码运行良好。对于每个比较,我想绘制数字(这也很好) 这是我到目前为止所做的,但我一直在研究如何浏览文件夹中的所有文件,以及如何保存输出,以便我可以绘制它。提前谢谢 df <- read.delim(file.ch
File_1a
File_1b
File_2a
File_2b
File_3a
File_3b
我想执行一个函数,比较两个文件中每个文件的一列,然后输出一个数字。我比较什么其实并不重要,因为我向你保证代码运行良好。对于每个比较,我想绘制数字(这也很好)
这是我到目前为止所做的,但我一直在研究如何浏览文件夹中的所有文件,以及如何保存输出,以便我可以绘制它。提前谢谢
df <- read.delim(file.choose(),header=TRUE)
df2 <- read.delim(file.choose(),header=TRUE)
View(df)
total <- merge(df,df2,by="Start")
total[,5][total[,5] == "2"] <- "d"
total[,9][total[,9] == "2"] <- "d"
View(total)
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
View(total)
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
View(total)
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree")
View(total)
print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)
我希望在上面编号的连续文件对之间进行比较。如果您希望在每个
文件对中的相同的列中执行操作,即(文件1a
和文件1b
,文件2a
和文件2b
等),您可以这样做(我只是复制/粘贴您的代码,因为您提到它工作得很好。如果您只显示了数据集的几行,这些步骤本可以简化。)
更新2
使用getwd()
指定路径
和read.delim
lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)),
function(.files) {
total <- Reduce(function(...) merge(..., by='Start'),
lapply(.files, function(x) read.delim(paste(getwd(),
x, sep="/"), header=TRUE, sep='')))
total[,5][total[,5]=='2'] <- 'd'
total[,9][total[,9]=='2'] <- 'd'
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree")
print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)
total
})
也许我有点傻…我把我的文件名为File_1和File_2等放在工作目录中。当我使用上述代码时,我得到了错误:拆分错误(files,gsub(“[a-Za-z]\\\..*”,“”,files)):对象“files”未找到它是[1]“File_1.txt”“File_2.txt”“File_2”“File_3”“File_3”“File_4”他们都是。txt@user3632206根据你提供的信息,我得到了gsub(“[A-Za-z]\\..*”,“”,files)\[1]“File\u 1”“File\u 2”“File\u 2”
所以这些文件被称为File\u 1a File\u 1b等,但gsub例程给我“File\u 1”“File\u 2”“File\u 2”@user3632206是的,这是预期的结果。只需执行拆分(files,gsub(“[A-Za-z]\\…*“,”,files))
并检查您得到了什么是的-如果您同意的话,我试着移动这个来聊天。我检查了字符串,得到了1$的列表:'data.frame':309579 obs.共5个变量:…$chromose:Factor w/24个级别“1”,“10”,“11”,“11”…$1….$Start:int[1:309579]11001200013000140001500160001700018000190001…...$比率:num[1:309579]-13.531.041.041.11…..$媒体比率:num[1:309579]-13.531.1111.1.1…..$CopyNumber:int[1:309579]2.7.2.2…但是当我运行df时,我不得不重新启动R studio,现在我又回到了对象“files”not found错误。是否有一种方法可以指定文件夹的路径,然后遍历按字母顺序排序的成对文件或其他内容?您好。是的,它们在getwd中,我在控制台中尝试过,得到了相同的错误信息如果read.delim
仅起作用,那么您可以尝试d1吗
lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)),
function(.files) {
total <- Reduce(function(...) merge(..., by='Start'),
lapply(.files, function(x) read.table(x, header=TRUE)))
total[,5][total[,5]=='2'] <- 'd'
total[,9][total[,9]=='2'] <- 'd'
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree")
print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)
total
})
lapply(lstN, head,2)
# $File_1
# Start Chromosome.x Ratio.x MedianRatio.x CopyNumber.x Chromosome.y Ratio.y
#1 1 1 -1 -1 d 1 0.697902
#2 1 1 -1 -1 d X -1.000000
# MedianRatio.y CopyNumber.y agree
#1 1.2794 g disagree
#2 -1.0000 d agree
#$File_2
# Start Chromosome.x rt.x med.x CN.x Chromosome.y rt.y med.y CN.y agree
#1 1 10 1 3 d 5 2 13 d agree
#2 1 10 1 3 d 10 1 3 d agree
lstN <- lapply(split(files, gsub("[A-Za-z]\\..*", "", files)),
function(.files) {
total <- Reduce(function(...) merge(..., by='Start'),
lapply(.files, function(x) read.delim(paste(getwd(),
x, sep="/"), header=TRUE, sep='')))
total[,5][total[,5]=='2'] <- 'd'
total[,9][total[,9]=='2'] <- 'd'
total[,5][total[,5] < 2 & total[,5] !="d"] <- "l"
total[,9][total[,9] < 2 & total[,9] !="d"] <- "l"
total[,5][total[,5] > 2 & total[,5] !="l" & total[,5] !="d"] <- "g"
total[,9][total[,9] > 2 & total[,9] !="l" & total[,9] !="d"] <- "g"
total$agree <- ifelse((total[,5] == total[,9]),"agree","disagree")
print(sum(total$agree == "agree")/nrow(total)*100)
print(sum(total$agree == "disagree")/nrow(total)*100)
total
})
set.seed(29) #creating some sample data in a `list`
lst <- lapply(1:4, function(i) cbind(Start=sample(1:9, 5, replace=FALSE),
as.data.frame(matrix(sample(-5:10, 5*4, replace=TRUE), ncol=4)) ) )
nm1 <- paste0('File_', paste0(rep(1:2, each=2), c('a', 'b')), '.txt')
#create the files using `write.table`
invisible(lapply(seq_along(lst), function(i)
write.table(lst[[i]], file=nm1[i], quote=FALSE,row.names=FALSE)))
rm(list=ls())
ls()
#character(0)
files <- list.files(pattern='File_\\d+[A-Za-z]')
files #files created in the working directory
#[1] "File_1a.txt" "File_1b.txt" "File_2a.txt" "File_2b.txt"