Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于(pairwaise)字符串模式(R或Python)合并文件夹中的所有对象对_Python_Regex_R - Fatal编程技术网

基于(pairwaise)字符串模式(R或Python)合并文件夹中的所有对象对

基于(pairwaise)字符串模式(R或Python)合并文件夹中的所有对象对,python,regex,r,Python,Regex,R,我有两种包含时间序列的表。一种类型包含引用总体的数据,并存储在文件中,文件末尾具有特定的模式。另一种类型包含有关资源的数据。此外,我有不同农场的文件(数百个)。因此,文件夹的内容是: Farm01_population Farm01_resources Farm02_population Farm02_resources Farm03_population Farm03_resources Farm04_population Farm04_resources ........ 等等 我还必须在

我有两种包含时间序列的表。一种类型包含引用总体的数据,并存储在文件中,文件末尾具有特定的模式。另一种类型包含有关资源的数据。此外,我有不同农场的文件(数百个)。因此,文件夹的内容是:

Farm01_population
Farm01_resources
Farm02_population
Farm02_resources
Farm03_population
Farm03_resources
Farm04_population
Farm04_resources
........
等等

我还必须在每个文件中进行计算。到目前为止,我已经开始了这项任务,首先分别对人口和资源进行计算

population_files <- list.files("path",pattern="population.txt$")
resources_files <- list.files("path",pattern="resources.txt$")

for(i in 1:length(population_files)){......}

for(j in 1:length(resources_files)){......}
等等

由于农场的数量非常大,我无法在每个文件名的开头编写特定的字符串作为模式。我需要说明的是,必须合并在开始时共享相同模式的表,无论该模式(场)是什么


我使用的是R,但也欢迎使用Python解决方案。

只需将所有内容保存在一个文件中即可

library(dplyr)
library(rex)

file_regex = 
  rex(capture(digits),
      "_",
      capture(anything))

catalog = 
  data_frame(file = list.files("path") ) %>%
  extract(file, 
          c("ID", "type"), 
          file_regex, 
          remove = FALSE)

population =
  catalog %>%
  filter(type == "population")
  group_by(ID) %>%
  do(.$file %>% first %>% read.csv)

resources =
  catalog %>%
  filter(type == "resources")
  group_by(ID) %>%
  do(.$file %>% first %>% read.csv)

together = full_join(population, resources)

假设文件是CSV格式,请考虑下面的基础R和Python 3(使用)解决方案。两者都使用正则表达式模式查找相应的人口和资源文件,然后使用链接的场ID合并到最终的表中。请注意,如果需要迭代过去的99个文件,请确保将正则表达式数字计数
{}
调整为
{3}
(对于Python,不要更改字符串格式运算符
{0}

R

path = "C:/Path/To/Files"
numberoffiles = 2

for (i in (1:numberoffiles)) {  
    if (i < 10) { i = paste0('0', i)  } else { i = as.character(i) }

    filespop <- list.files(path, pattern=sprintf("^[a-zA-Z]*[%s]{2}_population.csv$", i))
    dfpop <- read.csv(paste0(path, "/", filespop[[1]]))

    filesres <- list.files(path, pattern=sprintf("^[a-zA-Z]*[%s]{2}_resources.csv$", i))
    dfres <- read.csv(paste0(path, "/", filesres[[1]])) 

    farm <- gsub(sprintf("[%s]{2}_population.csv", i), "", filespop[[1]])
    mergedf <- merge(dfpop, dfres, by=c('FarmID'), all=TRUE)
    write.csv(mergedf, paste0(path, "/", farm, 
                              sprintf("%s_FinalTable_r.csv", i)), row.names=FALSE)
}
path=“C:/path/To/Files”
numberoffiles=2
对于(i in(1:numberoffiles)){
如果(i<10){i=paste0('0',i)}否则{i=as.character(i)}

按表格排列的文件,您是指csv、txt、文件还是R文件(.rdata、.rds)?
path = "C:/Path/To/Files"
numberoffiles = 2

for (i in (1:numberoffiles)) {  
    if (i < 10) { i = paste0('0', i)  } else { i = as.character(i) }

    filespop <- list.files(path, pattern=sprintf("^[a-zA-Z]*[%s]{2}_population.csv$", i))
    dfpop <- read.csv(paste0(path, "/", filespop[[1]]))

    filesres <- list.files(path, pattern=sprintf("^[a-zA-Z]*[%s]{2}_resources.csv$", i))
    dfres <- read.csv(paste0(path, "/", filesres[[1]])) 

    farm <- gsub(sprintf("[%s]{2}_population.csv", i), "", filespop[[1]])
    mergedf <- merge(dfpop, dfres, by=c('FarmID'), all=TRUE)
    write.csv(mergedf, paste0(path, "/", farm, 
                              sprintf("%s_FinalTable_r.csv", i)), row.names=FALSE)
}
import os
import re
import pandas as pd

# CURRENT DIRECTORY OF SCRIPT
cd = os.path.dirname(os.path.abspath(__file__))    
numberoffiles = 2

for item in os.listdir(cd):

    for i in range(1, numberoffiles+1):
        i = '0'+str(i) if i < 10 else str(i)   

        filepop = re.match("^[a-zA-Z]*[{0}]{{2}}_population.csv$".format(i), item, flags=0)
        fileres = re.match("^[a-zA-Z]*[{0}]{{2}}_resources.csv$".format(i), item, flags=0)

        if filepop:
           dfpop = pd.read_csv(os.path.join(cd, item))

        if fileres:    
           dfres = pd.read_csv(os.path.join(cd, item))

           farm = item.replace("{0}_resources.csv".format(i), "")                        
           mergedf = pd.merge(dfpop, dfres, on=['FarmID'])
           mergedf.to_csv(os.path.join(cd, "{0}{1}_FinalTable_py.csv"\
                          .format(farm, i)), index=False)