Python 遍历文件夹并找到要放入数据帧的文件_Python_Pandas_Csv

Python 遍历文件夹并找到要放入数据帧的文件

python pandas csv

Python 遍历文件夹并找到要放入数据帧的文件,python,pandas,csv,Python,Pandas,Csv,我有一个目录。/customer\u data/*，包含15个文件夹。每个文件夹都是唯一的客户示例：。/customer\u data/customer\u 1 在每个客户文件夹中都有一个名为surveys.csv的csv 目标：我想遍历。/customer\u data/*中的所有文件夹，找到每个唯一客户的surveys.csv，并创建一个连接的数据框架。我还想在dataframe中添加一列，其中包含客户id，即文件夹的名称 import glob import os rootdir = '

我有一个目录

。/customer\u data/*

，包含15个文件夹。每个文件夹都是唯一的客户

示例：

。/customer\u data/customer\u 1

在每个客户文件夹中都有一个名为

surveys.csv

的csv

目标：我想遍历

。/customer\u data/*

中的所有文件夹，找到每个唯一客户的

surveys.csv

，并创建一个连接的数据框架。我还想在dataframe中添加一列，其中包含客户id，即文件夹的名称

import glob
import os
rootdir = '../customer_data/*'
dataframes = []
for subdir, dirs, files in os.walk(rootdir):
    
    for file in files:
        csvfiles = glob.glob(os.path.join(rootdir, 'surveys.csv'))
        
        # loop through the files and read them in with pandas
         # a list to hold all the individual pandas DataFrames
      
        df = pd.read_csv(csvfiles)
        df['customer_id'] = os.path.dirname
        dataframes.append(df)
            
# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)
result.head()

这个代码没有给我所有的15个文件。请提供帮助

您可以使用该模块进行此操作

from pathlib import Path
import pandas as pd

dfs = []
for filepath in Path("customer_data").glob("customer_*/surveys.csv"):
    this_df = pd.read_csv(filepath)
    # Set the customer ID as the name of the parent directory.
    this_df.loc[:, "customer_id"] = filepath.parent.name
    dfs.append(this_df)

df = pd.concat(dfs)

让我们尝试使用带有

rglob

的pathlib，它将递归地搜索目录结构中与

glob

模式匹配的所有文件。在这种情况下，调查

import pandas as pd 
from pathlib import Path

root_dir = Path('/top_level_dir/')

files = {file.parent.parts[-1] : file  for file in Path.rglob('*survey.csv')}

df = pd.concat([pd.read_csv(file).assign(customer=name) for name,file in files.items()])

注意，pathlib需要python

3.4