在R中同时格式化多个文件

在R中同时格式化多个文件,r,for-loop,formatting,R,For Loop,Formatting,我对R很陌生,所以我希望这个问题仍然有趣。我为循环创建了一个,它生成了11个csv文件。以下是我在这方面使用的代码,以帮助澄清问题: for (i in seq(0, 1, by = 0.1)) {collar$results2<-mutate(collar,results2 = case_when( (probability > i & results1 == "POSITIVE") | (probability < i & results1 == "NEGA

我对R很陌生,所以我希望这个问题仍然有趣。我为循环创建了一个
,它生成了11个csv文件。以下是我在这方面使用的代码,以帮助澄清问题:

for (i in seq(0, 1, by = 0.1))
{collar$results2<-mutate(collar,results2 = case_when( (probability > i & results1 == "POSITIVE") | (probability < i & results1 == "NEGATIVE") ~ TRUE, TRUE ~ FALSE) )
as.character(collar$results2)
collaraccuracy1=paste('collar41361_41365', i, 'csv', sep = '.')
write.csv(collar,collaraccuracy1)}
现在,我想一次格式化所有文件,因为它们具有相同的结构(10列、240行和相同的列标题)和相同的名称格式

下面是我试图接管这11个文件的代码。我使用了
Sys.glob
,因为在另一篇文章中提到这是执行任务的最佳方式。我以前为一个文件编写了这个操作,它成功了。现在,我想同时为所有11个文件应用代码:

#1) Reading multiple files at one. Now, this will only work for the files with a decimal value of i in their name -which is fine-. If I was reading files with i=0 or i=1, then we'll have the pattern "collar41361_41365.*.csv". Am I right?

collaraccuracy<-lapply(Sys.glob("collar41361_41365.***.csv"), read.csv)

#2) Select only the columns with header "observed","predicted","probability","results1","results2.results2"

collaraccuracy<-fread("collar41361_41365.***.csv",select=c("observed","predicted","probability","results1","results2.results2"),stringsAsFactors = F)

#3) Rename column "results2.results2" to "results2"

colnames(collaraccuracy)<-c("observed","predicted","probability","results1","results2")

#4) Create 6th column "results" by merging columns "results1" and "results2"

collaraccuracy$results <- paste(collaraccuracy$results2, 
collaraccuracy$results1,sep="_")


#5) End of the formatting. Write new formated csv files with the pattern "collar41361_by_41365.i.csv"

collaraccuracy2=paste('collar41361_by_41365', i, 'csv', sep = '.')
write.csv(collaraccuracy,collaraccuracy2)

得到了R.打印的11个
“NULL”
s。。我在正确的轨道上吗?

退一步,听起来你想对每个
I
执行以下操作:

  • 添加一列
    results2
    ,检查预测值是否与概率
    i
    的观察值匹配
  • 添加一列连接
    results1
    results2
    results

你之所以看到像
results2.results2
这样奇怪的列名,是因为原来的
for
循环是多余的;你不需要两个作业声明(
collar$results2)你似乎喜欢删除并重新发布你的问题。如果帖子不清楚,只需编辑它,而不是删除并重新发布。你删除的帖子越多,被禁止提问的机会就越高。此外,接受答案也是避免禁令的一种方法(我认为)这是一种常见的礼貌。这是你之前的一个问题:@NelsonGon谢谢你的评论。我对你的说法感到有点困惑,因为我从未删除过以前的问题,因为我知道其他人可能会从共享信息中受益。我也从未回复过一个问题。如果在同一脚本上问一个新问题是p,我很抱歉问题是,但这两个问题都与我在创建脚本时遇到的非常不同的问题有关。我只是认为,发布两个独立的问题(即使它们属于同一脚本的创建)比编辑以前的问题(原始信息可能会丢失)更能提供信息。@NelsonGon我同意以下事实g answers是这个论坛动态中非常重要的一部分。感谢您的提醒,我鼓励每个用户接受并投票回答和评论。在我看来,在first for循环中进行所有这些操作会更容易,并且只有在您完全按照您的需要格式化数据后才向CSV写入。这种方法会更简单吗解决你的问题?这正是我想做的。我会看一下,然后再给你回复。
#1) Reading multiple files at one. Now, this will only work for the files with a decimal value of i in their name -which is fine-. If I was reading files with i=0 or i=1, then we'll have the pattern "collar41361_41365.*.csv". Am I right?

collaraccuracy<-lapply(Sys.glob("collar41361_41365.***.csv"), read.csv)

#2) Select only the columns with header "observed","predicted","probability","results1","results2.results2"

collaraccuracy<-fread("collar41361_41365.***.csv",select=c("observed","predicted","probability","results1","results2.results2"),stringsAsFactors = F)

#3) Rename column "results2.results2" to "results2"

colnames(collaraccuracy)<-c("observed","predicted","probability","results1","results2")

#4) Create 6th column "results" by merging columns "results1" and "results2"

collaraccuracy$results <- paste(collaraccuracy$results2, 
collaraccuracy$results1,sep="_")


#5) End of the formatting. Write new formated csv files with the pattern "collar41361_by_41365.i.csv"

collaraccuracy2=paste('collar41361_by_41365', i, 'csv', sep = '.')
write.csv(collaraccuracy,collaraccuracy2)
collarcolumns<-function(collaraccuracy1)
{collaraccuracy1<-fread(("collar41361_41365.1.csv"),select=c("observed","predicted","probability","results1","results2.results2"),stringsAsFactors = F)
colnames(collaraccuracy1)<-c("observed","predicted","probability","results1","results2")
collaraccuracy1$results <- paste(collaraccuracy1$results2, collaraccuracy1$results1,sep="_")
collaraccuracy2=paste('collar41361_by_41365', i, 'csv', sep = '.')
write.csv(collaraccuracy1,collaraccuracy2)}

lapply(Sys.glob("collar41361_41365.*.csv"), collarcolumns)
for(i in seq(0, 1, by = 0.1)) {
  collar.temp = collar %>%
    mutate(results2 = case_when((probability > i & results1 == "POSITIVE") |
                                  (probability < i & results1 == "NEGATIVE") ~ T,
                                T ~ F)) %>%
    mutate(results = paste(results1, results2, sep = "_"))
  collaraccuracy1 = paste('collar41361_41365', i, 'csv', sep = '.')
  write.csv(collar.temp, collaraccuracy1)
}
collar.tidy = do.call(
  "bind_rows",
  lapply(
    seq(0, 1, by = 0.1),
    function(x) {
      collar %>%
        mutate(cutoff = x,
               results2 = case_when((probability > x & results1 == "POSITIVE") |
                                      (probability < x & results1 == "NEGATIVE") ~ T,
                                    T ~ F)) %>%
        mutate(results = paste(results1, results2, sep = "_"))
    }
  )
)