R 如何在多个数据帧中循环查找字符串列表、搜索包含字符串的列并创建多个新文件？_R_For Loop

R 如何在多个数据帧中循环查找字符串列表、搜索包含字符串的列并创建多个新文件？

r for-loop

R 如何在多个数据帧中循环查找字符串列表、搜索包含字符串的列并创建多个新文件？,r,for-loop,R,For Loop,我有24个数据文件（bsls）。每个文件包含固定数量的行，但包含可变数量的列（站点）。我有一个23个站点的干净列表，但无法精确匹配，因为每个站点的列名都包含其他信息我已使用以下代码将这些文件读入R： #list files from dir and read, skipping rows until 'Q Num' temp <- list.files() # e.g. info-stuff-nameofbsl-otherStuff.csv # read.xls and strip b

我有24个数据文件（

bsls

）。每个文件包含固定数量的行，但包含可变数量的列（

站点

）。我有一个23个

站点的干净列表

，但无法精确匹配，因为每个站点的列名都包含其他信息

我已使用以下代码将这些文件读入

：

#list files from dir and read, skipping rows until 'Q Num'
temp <- list.files() # e.g. info-stuff-nameofbsl-otherStuff.csv

# read.xls and strip bsl name from file and assign as object name
for(i in temp){
    assign(unlist(strsplit(i, split = '-', fixed = T))[3],
           read.xls(i, pattern = "Q Num"))
}

#create list of dataframes (24 bsls)
bsls <- Filter(function(x) is(x, "data.frame"), mget(ls()))

#clean list of site names
sites <- ("NewYork","London","Sydney","Paris","Manchester","Angers","Venice","Bangkok","Glasgow","Boston","Perth","Canberra","Lyons","Washington","Milan","Cardiff","Dublin","Frankfurt","Ottawa","Toronto","El.Salvador","Taltal","Caldera")

我需要的结果是23个

站点中的每个站点都有一个.csv
文件，其中包含24个数据文件（bsl
）中的所有列
我当前的尝试…
for(site in sites){                             #for each site
    assign(site, data.frame())                  #create empty data frame to add vectors to
    for(bsl in dfs){                            #for each dataset
        if (grepl(site, colnames(bsl))){        #substring match
           next                                 #go back to for loop
        }
    assign(site$bsl, bsl[,grepl("site", colnames(bsl))]) #assign column to dataframe
    } 
}

#this loop prints the files
for (site in sites){
    #create new file with question cols only
    newfile <- data.frame(NewYork[,1:2], stringsAsFactors = F)
    # search for columns in bsls relating to site
    for (bsl in bsls){
        colids <- grepl(site, colnames(bsl))
        cols <- bsl[,colids, drop = F]
        newfile <- cbind(newfile, cols)
        }
    filename <- paste0("Site ", site," .csv")
    write.xlsx(newfile, file = filename, row.names = F)
}

解决方案如下所示…
例如London.csv
QNum,   QuestionText, BSLname1_Other_info,  BSLname2_some_other_info, BSL5other_diff_info, 
q17a,   question?,                 74%,              69%,                     81%,                  76%,
q17b,   Another question?,         72%,              73%,                     77%,                  74%,

将有23个文件，每个站点一个，包含24个输入bsl
文件中与站点相关的列
编辑-值得一提的是，每个BSL
都不被称为bsl1
，bsl2
。。。等，但实际上是唯一的字符串，例如，单元
，部分
，团队
。。。等等。
以下代码最终解决了我的问题。首先，我不得不将原来的问题进一步细分，在for循环
之前重命名bsls
数据帧列表中的所有列。这是为了知道bsl
站点属于哪个站点
——可以找到重命名逻辑
library(dplyr)
library(stringi)
library(tidyr)


bind_rows(bsls, .id = bsl) %>%
  gather(variable, value, 
         matches(sites %>% paste(collapse = "|") ), 
         na.rm = TRUE ) %>%
  separate(variable, c("site", "new_variable", 
           sep = "_", extra = "merge") %>%
  unite(final_variable, bsl, new_variable, sep = "_") %>%
  spread(final_variable, value) %>%
  group_by(site) %>%
  do(write.csv(., paste("site", first(.$site), ".csv") ) )

循环解决方案
for(site in sites){                             #for each site
    assign(site, data.frame())                  #create empty data frame to add vectors to
    for(bsl in dfs){                            #for each dataset
        if (grepl(site, colnames(bsl))){        #substring match
           next                                 #go back to for loop
        }
    assign(site$bsl, bsl[,grepl("site", colnames(bsl))]) #assign column to dataframe
    } 
}

#this loop prints the files
for (site in sites){
    #create new file with question cols only
    newfile <- data.frame(NewYork[,1:2], stringsAsFactors = F)
    # search for columns in bsls relating to site
    for (bsl in bsls){
        colids <- grepl(site, colnames(bsl))
        cols <- bsl[,colids, drop = F]
        newfile <- cbind(newfile, cols)
        }
    filename <- paste0("Site ", site," .csv")
    write.xlsx(newfile, file = filename, row.names = F)
}

#此循环打印文件
用于（站点中的站点）{
#仅使用问题列创建新文件
新文件感谢@bramtayl的尝试。有几点。collect
bind_rows（bsl）
需要library（tidyr）
，因为每个bsl
数据集的列数不同，所以无法使用bind_cols（bsls）
哪个是绑定的，但是，我需要知道新的站点数据集的每个列来自哪个bsl
数据集。也就是说，这些列需要包含它来自的bsl
数据集的名称。即使列的数量不同，绑定行确实应该工作。我添加了.id，以便您能够保留bsls信息。还添加了要收集的na.rm参数。错误：未找到对象“变量”
。此外，每个站点站点
都有一个唯一的名称，例如（“纽约”、“伦敦”、“悉尼”…等）
因此以（“站点”）开头
无效。对此我深表歉意，我将更新问题以反映唯一的名称。可能值得一提的是，BSLs
也是唯一的字符串。好的，编辑后希望能够处理不同的站点名称。BSLs列表的名称将在末尾的变量名称中结束，因此只需选择它们这很有道理。