Python 在联合excel文件之前创建新的数据列_Python

Python 在联合excel文件之前创建新的数据列

python

Python 在联合excel文件之前创建新的数据列,python,Python,我回复了这个关于union excel的问题。我知道如何从excel中读取数据。我只是想知道有没有一种方法可以将它们自动合并。我做了一些谷歌搜索。我找不到和我一样的情况。这就是我提出这个问题的原因。我不知道为什么我之前的问题结束了我真的很感谢你的帮助我需要垂直合并几个excel文件。这些文件位于同一文件夹中。它们有相同的列。但是，它们没有“日期”列，日期在excel的名称上例如：名为“项目”的excel 1 有感冒： a b c 1 2 3 4 5 6 a b c 2

我回复了这个关于union excel的问题。我知道如何从excel中读取数据。我只是想知道有没有一种方法可以将它们自动合并。我做了一些谷歌搜索。我找不到和我一样的情况。这就是我提出这个问题的原因。我不知道为什么我之前的问题结束了

我真的很感谢你的帮助

我需要垂直合并几个excel文件。这些文件位于同一文件夹中。它们有相同的列。但是，它们没有“日期”列，日期在excel的名称上

例如：

名为“项目”的excel 1 有感冒：

a  b  c
1  2  3
4  5  6

a  b  c
2  2  3
3  5  6

a  b  c
3  3  3
6  5  6

excel 2命名为“项目\u 03222021” 有感冒：

a  b  c
1  2  3
4  5  6

a  b  c
2  2  3
3  5  6

a  b  c
3  3  3
6  5  6

。 . . 名为“项目”的excel 10 有感冒：

a  b  c
1  2  3
4  5  6

a  b  c
2  2  3
3  5  6

a  b  c
3  3  3
6  5  6

我需要让他们像：

Date      a   b   c
03152021  1   2   3
03152021  4   5   6
03222021  2   2   3
03222021  3   5   6
.
.
.
05172021  3   3   3
05222021  6   5   6

多谢各位

以下是我尝试的代码：

all_files = []
for root, dirs, files in os.walk(r'c:user\\' ):
    for x in files:
        if '.xlsx' in x:
            all_files.append(root + '\\' + x)
df1 = pd.read_excel([x for x in all_files if '0315' in x][0])
df1.loc[:,'Date'] = '03152021'
df1['Date'] = pd.to_datetime(df1['Date'], format='%m%d%Y')

df2 = pd.read_excel([x for x in all_files if '0322' in x][0])
df2.loc[:,'Date'] = '03222021'
df2['Date'] = pd.to_datetime(df2['Date'], format='%m%d%Y')
.
.
.
df10 = pd.read_excel([x for x in all_files if '0517' in x][0])
df10.loc[:,'Date'] = '05172021'
df10['Date'] = pd.to_datetime(df10['Date'], format='%m%d%Y')

union = pd.concat([df1, df2, ..., df10], ignore_index=True)

我只是手动读取excel并添加日期列。我正在试图找到一种方法可以自动完成它们

谢谢

我不是使用Pandas库处理Excel文件的专家，但我认为我成功地实现了该过程的自动化：

import os
import pandas as pd

dfList = []  #dataframe list
for root, dirs, files in os.walk(r'c:user\\' ):
    excelFiles = (file for file in files if '.xlsx' in file) #generator expression with the files that ends with .xlsx

    for f in excelFiles:
        print(f)
        dateName = f.split('.')[0].split('_')[-1] #assuming that the pattern "+XXXX_[DateString].xlsx" will not be changed

        df = pd.read_excel(os.path.join(root,f))
        df.loc[:,'Date'] = dateName
        df = df[['Date','a','b','c']] #change the order of columns
        df['Date'] = pd.to_datetime(df['Date'], format='%m%d%Y')

        dfList.append(df)

union = pd.concat(dfList,ignore_index=True)
print(union)

我使用满足以下参数的文件进行了一些测试：

file name: ea_05122021.xlsx
file name: eb_03152021.xlsx
file name: ec_03222021.xlsx
file name: xx_05172021.xlsx

        Date   a   b    c
0 2021-05-12   1   4    5
1 2021-05-12   2   3    6
2 2021-03-15   1   4    5
3 2021-03-15   2   3    6
4 2021-03-22   1   4   54
5 2021-03-22  43  12   55
6 2021-05-17  33  56  677
7 2021-05-17  65  76  998

我希望这个答案也能帮助你