Python 如何从同一Excel文件中的多个工作表中获取选定列的平均值_Python_Excel_Pandas_Mean

Python 如何从同一Excel文件中的多个工作表中获取选定列的平均值

python excel pandas

Python 如何从同一Excel文件中的多个工作表中获取选定列的平均值,python,excel,pandas,mean,Python,Excel,Pandas,Mean,我正在处理一个包含22张工作表的大型excel文件，其中每张工作表都有相同的coulmn标题，但行数不相等。我想获得所有22张图纸AA至AX列的平均值（不包括零）。这些列有我在代码中使用的标题我不想阅读每一张纸，而是想在这些纸上循环，得到平均值作为输出。在其他帖子的答案的帮助下，我有以下几点： import pandas as pd xls = pd.ExcelFile('myexcelfile.xlsx') xls.sheet_names #print(xls.sheet_names)

我正在处理一个包含22张工作表的大型excel文件，其中每张工作表都有相同的coulmn标题，但行数不相等。我想获得所有22张图纸AA至AX列的平均值（不包括零）。这些列有我在代码中使用的标题

我不想阅读每一张纸，而是想在这些纸上循环，得到平均值作为输出。在其他帖子的答案的帮助下，我有以下几点：

import pandas as pd

xls = pd.ExcelFile('myexcelfile.xlsx')
xls.sheet_names
#print(xls.sheet_names)
out_df = pd.DataFrame()

for sheets in xls.sheet_names:
    df = pd.read_excel('myexcelfile.xlsx', sheet_names= None)
    df1= df[df[:]!=0]
    df2=df1.loc[:,'aa':'ax'].mean()
    out_df.append(df2)  ## This will append rows of one dataframe to another(just like your expected output)

print(out_df2)

## out_df will have data from all the sheets

到目前为止，代码仍然有效，但只有一张工作表。如何使其适用于所有22张图纸？

您可以使用numpy对pandas.DataFrame或pandas.Series执行基本数学运算看看下面我的代码

import pandas as pd, numpy as np

XL_PATH = r'C:\Users\YourName\PythonProject\Book1.xlsx'


xlFile = pd.ExcelFile(XL_PATH)
xlSheetNames = xlFile.sheet_names

dfList = []     # variable to store all DataFrame

for shName in xlSheetNames:
    df = pd.read_excel(XL_PATH, sheet_name=shName)  # read sheet X as DataFrame
    dfList.append(df)   # put DataFrame into a list


for df in dfList:
    print(df)
    dfAverage = np.average(df)  # use numpy to get DataFrame average
    print(dfAverage)

你好谢谢你的意见。我这样做了，但它仍然只能在一张纸上工作。我使用df=df.replace（0，np.NaN）在获取平均值时排除零

#Try code below

import pandas as pd, numpy as np, os

XL_PATH = "YOUR EXCEL FULL PATH"
SH_NAMES = "WILL CONTAINS LIST OF EXCEL SHEET NAME"
DF_DICT = {} """WILL CONTAINS DICTIONARY OF DATAFRAME"""

def readExcel():
        if not os.path.isfile(XL_PATH): return FileNotFoundError
        SH_NAMES = pd.ExcelFile(XL_PATH).sheet_names
        
        # pandas.read_excel() have argument 'sheet_name'
        # when you put a list to 'sheet_name' argument
        # pandas will return dictionary of dataframe with sheet_name as keys
        DF_DICT = pd.read_excel(XL_PATH, sheet_name=SH_NAMES)
        return SH_NAMES, DF_DICT
        
#Now you have DF_DICT that contains all DataFrame for each sheet in excel
#Next step is to append all rows data from Sheet1 to SheetX
#This will only works if you have same column for all DataFrame

def appendAllSheets():
    dfAp = pd.DataFrame()
    for dict in DF_DICT:
        df = DF_DICT[dict]
        dfAp = pd.DataFrame.append(self=dfAp, other=df)
    return dfAp
    
#you can now call the function as below:
dfWithAllData = appendAllSheets()
#now you have one DataFrame with all rows combine from Sheet1 to SheetX
#you can fixed the data, for example to drop all rows which contain '0'
dfZero_Removed = dfWithAllData[[dfWithAllData['Column_Name'] != 0]]
dfNA_removed = dfWithAllData[not[pd.isna(dfWithAllData['Column_Name'])]]

#last step, to find average or other math operation
#just let numpy do the job
average_of_all_1 = np.average(dfZero_Removed)
average_of_all_2 = np.average(dfNA_Removed)
#show result
#this will show the average of all
#rows of data from Sheet1 to SheetX from your target Excel File
print(average_of_all_1, average_of_all_2)