Python 如何从同一Excel文件中的多个工作表中获取选定列的平均值
我正在处理一个包含22张工作表的大型excel文件,其中每张工作表都有相同的coulmn标题,但行数不相等。我想获得所有22张图纸AA至AX列的平均值(不包括零)。这些列有我在代码中使用的标题 我不想阅读每一张纸,而是想在这些纸上循环,得到平均值作为输出。 在其他帖子的答案的帮助下,我有以下几点:Python 如何从同一Excel文件中的多个工作表中获取选定列的平均值,python,excel,pandas,mean,Python,Excel,Pandas,Mean,我正在处理一个包含22张工作表的大型excel文件,其中每张工作表都有相同的coulmn标题,但行数不相等。我想获得所有22张图纸AA至AX列的平均值(不包括零)。这些列有我在代码中使用的标题 我不想阅读每一张纸,而是想在这些纸上循环,得到平均值作为输出。 在其他帖子的答案的帮助下,我有以下几点: import pandas as pd xls = pd.ExcelFile('myexcelfile.xlsx') xls.sheet_names #print(xls.sheet_names)
import pandas as pd
xls = pd.ExcelFile('myexcelfile.xlsx')
xls.sheet_names
#print(xls.sheet_names)
out_df = pd.DataFrame()
for sheets in xls.sheet_names:
df = pd.read_excel('myexcelfile.xlsx', sheet_names= None)
df1= df[df[:]!=0]
df2=df1.loc[:,'aa':'ax'].mean()
out_df.append(df2) ## This will append rows of one dataframe to another(just like your expected output)
print(out_df2)
## out_df will have data from all the sheets
到目前为止,代码仍然有效,但只有一张工作表。如何使其适用于所有22张图纸?您可以使用numpy对pandas.DataFrame或pandas.Series执行基本数学运算 看看下面我的代码
import pandas as pd, numpy as np
XL_PATH = r'C:\Users\YourName\PythonProject\Book1.xlsx'
xlFile = pd.ExcelFile(XL_PATH)
xlSheetNames = xlFile.sheet_names
dfList = [] # variable to store all DataFrame
for shName in xlSheetNames:
df = pd.read_excel(XL_PATH, sheet_name=shName) # read sheet X as DataFrame
dfList.append(df) # put DataFrame into a list
for df in dfList:
print(df)
dfAverage = np.average(df) # use numpy to get DataFrame average
print(dfAverage)
你好谢谢你的意见。我这样做了,但它仍然只能在一张纸上工作。我使用df=df.replace(0,np.NaN)在获取平均值时排除零
#Try code below
import pandas as pd, numpy as np, os
XL_PATH = "YOUR EXCEL FULL PATH"
SH_NAMES = "WILL CONTAINS LIST OF EXCEL SHEET NAME"
DF_DICT = {} """WILL CONTAINS DICTIONARY OF DATAFRAME"""
def readExcel():
if not os.path.isfile(XL_PATH): return FileNotFoundError
SH_NAMES = pd.ExcelFile(XL_PATH).sheet_names
# pandas.read_excel() have argument 'sheet_name'
# when you put a list to 'sheet_name' argument
# pandas will return dictionary of dataframe with sheet_name as keys
DF_DICT = pd.read_excel(XL_PATH, sheet_name=SH_NAMES)
return SH_NAMES, DF_DICT
#Now you have DF_DICT that contains all DataFrame for each sheet in excel
#Next step is to append all rows data from Sheet1 to SheetX
#This will only works if you have same column for all DataFrame
def appendAllSheets():
dfAp = pd.DataFrame()
for dict in DF_DICT:
df = DF_DICT[dict]
dfAp = pd.DataFrame.append(self=dfAp, other=df)
return dfAp
#you can now call the function as below:
dfWithAllData = appendAllSheets()
#now you have one DataFrame with all rows combine from Sheet1 to SheetX
#you can fixed the data, for example to drop all rows which contain '0'
dfZero_Removed = dfWithAllData[[dfWithAllData['Column_Name'] != 0]]
dfNA_removed = dfWithAllData[not[pd.isna(dfWithAllData['Column_Name'])]]
#last step, to find average or other math operation
#just let numpy do the job
average_of_all_1 = np.average(dfZero_Removed)
average_of_all_2 = np.average(dfNA_Removed)
#show result
#this will show the average of all
#rows of data from Sheet1 to SheetX from your target Excel File
print(average_of_all_1, average_of_all_2)