Python 使用Pandas附加excel电子表格_Python_Python 3.x_Pandas

Python 使用Pandas附加excel电子表格

python python-3.x pandas

Python 使用Pandas附加excel电子表格,python,python-3.x,pandas,Python,Python 3.x,Pandas,我在文件夹中有以下数据集： a） 10张excel电子表格（不同名称） b）每个电子表格有7个选项卡。在每个电子表格的7个选项卡中，有2个具有完全相同的名称，而其余5个具有不同的工作表名称 c）我需要将10个不同电子表格中的5个excel表格连接起来 d）在所有10*5张图纸中，需要连接我该如何做才能连接所有50个电子表格，最终输出是一个“主”电子表格，并附加所有50个电子表格（而不连接每个excel文件中名称完全相同的两个表格）我正在使用以下代码使用jupyter笔记本连接工作表，但

我在文件夹中有以下数据集：

a） 10张excel电子表格（不同名称）

b）每个电子表格有7个选项卡。在每个电子表格的7个选项卡中，有2个具有完全相同的名称，而其余5个具有不同的工作表名称

c）我需要将10个不同电子表格中的5个excel表格连接起来

d）在所有10*5张图纸中，需要连接

我该如何做才能连接所有50个电子表格，最终输出是一个“主”电子表格，并附加所有50个电子表格（而不连接每个excel文件中名称完全相同的两个表格）

我正在使用以下代码使用jupyter笔记本连接工作表，但这并没有帮助：

import pandas as pd

xlsx = pd.ExcelFile('A://Data/File.xlsx')
data_sheets = []
for sheet in xlsx.sheet_names:
    data_sheets.append(xlsx.parse(sheet))
data = pd.concat(data_sheets)
print(data)

感谢阅读。

IIUC，您需要阅读10本工作簿中的所有工作表，并将每个数据框附加到列表

数据表中。一种方法是分配一个列表names\u以查找，并在迭代时附加每个工作表名称
names_to_find =[]
data_sheets = []
for excelfile in excelfile_list:
   xlsx = pd.ExcelFile(excelfile)

   for sheet in xlsx.sheet_names:
      data_sheets.append(xlsx.parse(sheet))
      names_to_find.append(sheet)

读取所有数据后，您可以使用名称\u查找
和np.unique
查找唯一的工作表名称及其频率
#find unique elements and return counts
unique, counts = np.unique(names_to_find,return_counts=True)

#find unique sheet names with a frequency of one
unique_set = unique[counts==1]

然后可以使用np.argwhere
查找names\u中存在unique\u set
的索引，以查找
#find the indices where the unique sheet names exist 
idx_to_select = np.argwhere(np.isin(names_to_find, unique_set)).flatten()

最后，了解一下列表，您可以将数据表
子集以包含感兴趣的数据：
#use list comprehension to subset data_sheets 
data_sheets = [data_sheets[i] for i in idx_to_select]
data = pd.concat(data_sheets)

总而言之：
import pandas as pd
import numpy as np
names_to_find =[]
data_sheets = []
for excelfile in excelfile_list:    
   xlsx = pd.ExcelFile(excelfile)

   for sheet in xlsx.sheet_names:        
      data_sheets.append(xlsx.parse(sheet))
      names_to_find.append(sheet)

#find unique elements and return counts
unique, counts = np.unique(names_to_find,return_counts=True)

#find unique sheet names with frequency of 1
unique_set = unique[counts==1]

#find the indices where the unique sheet names exist 
idx_to_select = np.argwhere(np.isin(names_to_find, unique_set)).flatten()

#use list comprehension to subset data_sheets subset data_sheets
data_sheets = [data_sheets[i] for i in idx_to_select]

#concat the data
data = pd.concat(data_sheets)

所有的工作表都有相同的数据结构吗？@Dubbdan是的，这五个工作表（名称不同）的数据结构完全相同。另外两个工作表（在所有excel文件中名称相同）的数据结构完全不同。我不关心这两个名字相同的人。我需要来自5的数据。它们总是以相同的顺序排列吗？你怎么知道哪张表（名称重复）是你想要的呢？@Dubbdan，例如：假设：第一张表有以下表名：['A'，'B'，1,2,3,4,5]，第二张表有以下表名：['A'，'B'，9,10,11,12,13]。常见的是表格“A”和“B”（我不需要这些），而其余的都需要在彼此下面追加。我应该把数据表的名称放在空列表中吗？它给出了一个错误：“没有要连接的对象”。我应该在哪里输入excel工作表的名称？我应该在哪里添加：a）5个工作表和b）10个工作簿的名称？此外，是否需要定义excel文件列表？excel文件列表应该是您要处理的所有excel文件的列表。glob.glob是您的朋友。我在哪里添加a）要附加的数据表和b）不考虑附加的数据表的列表？此例程将决定为您附加哪些工作表。您只需在代码顶部定义excelfiles\u列表。类似于“excelfile\u list=glob.glob（'some\u directory/*.xlsx'）”的内容。然后运行代码。如果您想查看附加了哪些图纸，请查看“唯一集”。它的长度应该是50。