Python 在大熊猫中组合单独的每日CSV_Python_Pandas_Dataframe_Date_Merge

Python 在大熊猫中组合单独的每日CSV

python pandas dataframe date merge

Python 在大熊猫中组合单独的每日CSV,python,pandas,dataframe,date,merge,Python,Pandas,Dataframe,Date,Merge,我有一堆CSV文件，每个文件都以收集日期命名，即： 2020-03-21.csv 2020-03-22.csv 2020-03-23.csv etc.... 我想创建一个包含所有CSV数据的pandas数据框，其中包含一个新的日期列，列出数据的来源日期。例如：当前单个CSV（例如：2020-03-19.CSV）所需结果（组合数据帧）：在熊猫身上实现这一目标的最佳方式是什么？我尝试了两种使用pd.merge和pd.concat的方法，但运气不佳。这只是一个模型，但应该可以工作：它依赖于模块

我有一堆CSV文件，每个文件都以收集日期命名，即：

2020-03-21.csv
2020-03-22.csv
2020-03-23.csv
etc....

我想创建一个包含所有CSV数据的pandas数据框，其中包含一个新的日期列，列出数据的来源日期。例如：

当前单个CSV（例如：2020-03-19.CSV）

所需结果（组合数据帧）：

在熊猫身上实现这一目标的最佳方式是什么？我尝试了两种使用

pd.merge

和

pd.concat

的方法，但运气不佳。

这只是一个模型，但应该可以工作：它依赖于模块来简化文件管理：

from pathlib import Path

#initialize path on directory
folder = Path(folder_name)

#no filters done here, since u r sure it a bunch of csv files

combo = (pd.read_csv(f)
         #stem is a pathlib method that extracts the name without the suffix
         #if the pd.to_datetime does not work, then discard it
         #and just assign f.stem to Date
         #u can convert to datetime after
         .assign(Date=pd.to_datetime(f.stem))
         for f in folder.iterdir())

#combine data into one dataframe
everything = pd.concat(combo, ignore_index = True)

首先，您需要列出文件夹中的所有路径csv文件：

import glob
csvfiles = []
csvfiles = glob.glob("/path/to/folder/*.csv")
print(csvfiles)

然后，您将在所有这些文件上循环并连接它们：

list_df = []
for csvfile in csvfiles:
    #read csv file to df
    df = pd.read_csv(csvfile)
    #get the filename ex: 2020-03-19
    csv_name = csvfile.split('/')[-1].split('.')[0]
    #create a new column with all values are filename ex: 2020-03-19
    df['Date'] = csv_name
    #add df to a list
    list_df.append(df)
#concat all the df in the list
final_df = pd.concat(list_df)

这对你的案子有用吗？是的！感谢这给了我

TypeError:“PosixPath”对象不可编辑

。我让

Path（）

调用我的csv文件的有效文件夹名。

.assign

和

.stem

方法看起来非常有用！刚做了改变。。。。忘了将该方法添加到folderWorks中，就像一个符咒！谢谢我想编辑：您的

.split

方法之一是编写

.split[]

而不是

.split（）

的。另外：在步骤1中，似乎

csvfiles=glob.glob（“/path/to/folder/*.txt”）

的工作原理与for循环相同（至少在Python 3.7中是这样）

import glob
csvfiles = []
csvfiles = glob.glob("/path/to/folder/*.csv")
print(csvfiles)

list_df = []
for csvfile in csvfiles:
    #read csv file to df
    df = pd.read_csv(csvfile)
    #get the filename ex: 2020-03-19
    csv_name = csvfile.split('/')[-1].split('.')[0]
    #create a new column with all values are filename ex: 2020-03-19
    df['Date'] = csv_name
    #add df to a list
    list_df.append(df)
#concat all the df in the list
final_df = pd.concat(list_df)