向Pandas中的数据框添加自定义标题并将其转换为HTML
我正在从目录列表中读取某些csv文件,即实际结果和预期结果。现在,我在实际结果中浏览每个csv,并将其与预期结果中的csv进行比较。然后我想将整个数据显示为HTML,如下所示 我已经编写了一些代码,用于实际清理数据,然后比较实际和预期CSV的数据帧 以下是全部代码:向Pandas中的数据框添加自定义标题并将其转换为HTML,html,python-3.x,pandas,csv,data-science,Html,Python 3.x,Pandas,Csv,Data Science,我正在从目录列表中读取某些csv文件,即实际结果和预期结果。现在,我在实际结果中浏览每个csv,并将其与预期结果中的csv进行比较。然后我想将整个数据显示为HTML,如下所示 我已经编写了一些代码,用于实际清理数据,然后比较实际和预期CSV的数据帧 以下是全部代码: import pandas as pd import sys from glob import glob import os import itertools # compareCSV takes in two args as p
import pandas as pd
import sys
from glob import glob
import os
import itertools
# compareCSV takes in two args as path of the two csv files to compare
def compare(expectedList,actualList):
ctr=0
dfList = list()
for (csv1,csv2) in itertools.zip_longest(expectedList,actualList):
df1_ctr=pd.read_csv(csv1,sep=',')
df1_ctr[df1_ctr.columns[1:]] = [x.split('\t') for x in df1_ctr['mean(ms)']]
df1=df1_ctr.apply(pd.to_numeric,errors='coerce')
df2_ctr=pd.read_csv(csv2,sep=',')
df2_ctr[df2_ctr.columns[1:]] = [x.split('\t') for x in df2_ctr['mean(ms)']]
df2=df2_ctr.apply(pd.to_numeric,errors='coerce')
print("Dataframe for Expected List for file : {} is \n {}".format(csv1,df1))
print("Dataframe for Actual List for file: {} is \n {}".format(csv2,df2))
d3=df1.loc[:,:] # Dataframe 1
d4=df2.loc[:,:] # Dataframe 2
d5=abs(((d3.subtract(d4))/d3)*100)
print("Deviation between file {} and {} is :\n {}".format(csv1,csv2,d5))
ctr=ctr+1
#Final Data frame
df=pd.concat([df1,df2,d5])
#print("{}".format(df))
dfList.append(df)
#print("Final Data frame: \n{}".format(dfList))
# for data in dfList:
# print("data at index: \n{}".format(data))
if __name__ == "__main__":
#file1=sys.argv[1] # FileName1
#file2=sys.argv[2] #FileName2
#compareCSV(file1,file2) # Compare CSV files passed in as paramters
os.chdir("expected_results")
expectedCSVs=glob("*.csv")
#print(expectedCSVs)
os.chdir("../actual_results")
actualCSVs=glob("*.csv")
#print(actualCSVs)
compare(expectedCSVs,actualCSVs)
我现在有一些多余的打印报表。
上述代码的输出如下:
Dataframe for Expected List for file : CT_QRW_25.csv is
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN 0.038973 0.044939 0.091076 0.363859 1760108
1 NaN 0.050652 0.044963 0.094738 0.402525 1354233
2 NaN 0.046500 0.045020 0.108138 0.320636 123448
3 NaN 1.872630 0.599966 33.313200 172.040000 21954617
4 NaN 37.752900 0.600484 603.063000 805.340000 2708258
Dataframe for Actual List for file: CT_QRW_25.csv is
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN 0.038973 0.044939 0.091076 0.363859 1760108
1 NaN 0.050652 0.044963 0.094738 0.402525 1354233
2 NaN 0.046500 0.045020 0.108138 0.320636 123448
3 NaN 1.872630 0.599966 33.313200 172.040000 21954617
4 NaN 37.752900 0.600484 603.063000 805.340000 2708258
Deviation between file CT_QRW_25.csv and CT_QRW_25.csv is :
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN 0.0 0.0 0.0 0.0 0.0
1 NaN 0.0 0.0 0.0 0.0 0.0
2 NaN 0.0 0.0 0.0 0.0 0.0
3 NaN 0.0 0.0 0.0 0.0 0.0
4 NaN 0.0 0.0 0.0 0.0 0.0
Dataframe for Expected List for file : CT_W_14.csv is
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN 97.8025 17.8492 725.619 891.455 5304765.0
Dataframe for Actual List for file: CT_W_14.csv is
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN 97.8025 17.8492 725.619 891.455 5304765.0
Deviation between file CT_W_14.csv and CT_W_14.csv is :
100%Q mean(ms) P50(ms) P99(ms) p99.9(ms) #Samples
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN 0.0 0.0 0.0 0.0 0.0
目标:
由于我现在拥有的语句是打印语句,因此如果我想将其转换为HTML,我将无法使其成为动态的。我的目标是将其输出为HTML文件。或者,即使有一种自定义方法在数据框中添加一行作为标题,也可以。如果偏差大于10%,我想用红色显示单元格。如果有人遇到过这样的情况,那就太好了,请帮助我。任何帮助都将不胜感激。熊猫有一个特殊的对象,可以使用其
.render
方法将其导出为HTML,或使用将excel导出为excel
。您可以使用CSS设置表格格式并添加如下标题:
def highlight_high(series, threshold, colour):
return ['background-color:'+ colour.lower() if threshold <= i else 'background-color: white' for i in series]
# df.style.apply creates a pandas.io.formats.style.Styler object from a DataFrame
highlighted = df.style.apply(highlight_high, axis=0, subset=pd.IndexSlice[:,'P50(ms)'], colour = 'red', threshold = 0.5)
# adding a caption
highlighted = highlighted.set_caption('Highlighted P50')
# render() generates the HTML for the Styler object
with open('table.html', 'w') as f:
f.write(highlighted.render())
def highlight_high(系列、阈值、颜色):
返回['background-color:'+color.lower(),如果为阈值