使用Python将数据帧导出到PDF文件中_Python_Pdf_Pandas_Reportlab_Pypdf

使用Python将数据帧导出到PDF文件中

python pdf pandas

使用Python将数据帧导出到PDF文件中,python,pdf,pandas,reportlab,pypdf,Python,Pdf,Pandas,Reportlab,Pypdf,为熊猫中的数据帧生成PDF的有效方法是什么？一种方法是使用降价。您可以使用df.to_html（）。这将数据帧转换为html表。从那里可以将生成的html放入标记文件（.md）（请参阅）。在此基础上，有一些实用程序可以将降价转换为pdf（）此方法的一个一体化工具是使用Atom文本编辑器（）。在那里你可以使用一个扩展，搜索“markdown to pdf”，这将为你进行转换注意：最近在使用访问html（）时，出于某种原因，我不得不删除额外的“\n”字符。我选择使用Atom->Find->'\n

为熊猫中的数据帧生成PDF的有效方法是什么？

一种方法是使用降价。您可以使用

df.to_html（）

。这将数据帧转换为html表。从那里可以将生成的html放入标记文件（.md）（请参阅）。在此基础上，有一些实用程序可以将降价转换为pdf（）

此方法的一个一体化工具是使用Atom文本编辑器（）。在那里你可以使用一个扩展，搜索“markdown to pdf”，这将为你进行转换

注意：最近在使用

访问html（）

时，出于某种原因，我不得不删除额外的“\n”字符。我选择使用

Atom->Find->'\n'->替换'

总的来说，这应该可以做到

下面是我如何使用sqlite3、pandas和

这是一个带有中间pdf文件的解决方案

这个表格用一些最小的css打印出来

pdf转换使用weasyprint完成。您需要

pip安装weasyprint

# Create a pandas dataframe with demo data:
import pandas as pd
demodata_csv = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df = pd.read_csv(demodata_csv)

# Pretty print the dataframe as an html table to a file
intermediate_html = '/tmp/intermediate.html'
to_html_pretty(df,intermediate_html,'Iris Data')
# if you do not want pretty printing, just use pandas:
# df.to_html(intermediate_html)

# Convert the html file to a pdf file using weasyprint
import weasyprint
out_pdf= '/tmp/demo.pdf'
weasyprint.HTML(intermediate_html).write_pdf(out_pdf)

# This is the table pretty printer used above:

def to_html_pretty(df, filename='/tmp/out.html', title=''):
    '''
    Write an entire dataframe to an HTML file
    with nice formatting.
    Thanks to @stackoverflowuser2010 for the
    pretty printer see https://stackoverflow.com/a/47723330/362951
    '''
    ht = ''
    if title != '':
        ht += '<h2> %s </h2>\n' % title
    ht += df.to_html(classes='wide', escape=False)

    with open(filename, 'w') as f:
         f.write(HTML_TEMPLATE1 + ht + HTML_TEMPLATE2)

HTML_TEMPLATE1 = '''
<html>
<head>
<style>
  h2 {
    text-align: center;
    font-family: Helvetica, Arial, sans-serif;
  }
  table { 
    margin-left: auto;
    margin-right: auto;
  }
  table, th, td {
    border: 1px solid black;
    border-collapse: collapse;
  }
  th, td {
    padding: 5px;
    text-align: center;
    font-family: Helvetica, Arial, sans-serif;
    font-size: 90%;
  }
  table tbody tr:hover {
    background-color: #dddddd;
  }
  .wide {
    width: 90%; 
  }
</style>
</head>
<body>
'''

HTML_TEMPLATE2 = '''
</body>
</html>
'''

#使用演示数据创建熊猫数据框：
作为pd进口熊猫
解调器https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df=pd.read\U csv（解调数据\U csv）
#将数据框作为html表漂亮地打印到文件中
intermediate_html='/tmp/intermediate.html'
to_html_pretty（df、中间_html、'Iris Data'）
#如果您不想要漂亮的打印，只需使用熊猫：
#df.to_html（中间_html）
#使用weasyprint将html文件转换为pdf文件
进口weasyprint
out_pdf='/tmp/demo.pdf'
weasyprint.HTML（中间版），write\u pdf（输出版）
#这是上面使用的桌面打印机：
def to_html_pretty（df，filename='/tmp/out.html'，title=''：
'''
将整个数据帧写入HTML文件
格式很好。
感谢@stackoverflowuser2010的支持
漂亮的打印机看到了吗https://stackoverflow.com/a/47723330/362951
'''
ht=“”
如果标题！=''：
ht+=“%s\n”%title
ht+=df.to_html（class='wide'，escape=False）
将open（filename，'w'）作为f：
f、 写入（HTML_模板1+ht+HTML_模板2）
HTML_TEMPLATE1=''
氢{
文本对齐：居中；
字体系列：Helvetica、Arial、无衬线字体；
}
表{
左边距：自动；
右边距：自动；
}
表，th，td{
边框：1px纯黑；
边界塌陷：塌陷；
}
th，td{
填充物：5px；
文本对齐：居中；
字体系列：Helvetica、Arial、无衬线字体；
字体大小：90%；
}
表tbody tr：悬停{
背景色：#dddddd；
}
.宽{
宽度：90%；
}
'''
HTML_TEMPLATE2=''
'''

感谢@stackoverflowuser2010提供了漂亮的打印机，请参阅stackoverflowuser2010的答案

我没有使用pdfkit，因为我在无头机器上遇到了一些问题。但是weasyprint很棒。

首先使用

matplotlib

绘制表格，然后生成pdf

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

df = pd.DataFrame(np.random.random((10,3)), columns = ("col 1", "col 2", "col 3"))

#https://stackoverflow.com/questions/32137396/how-do-i-plot-only-a-table-in-matplotlib
fig, ax =plt.subplots(figsize=(12,4))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText=df.values,colLabels=df.columns,loc='center')

#https://stackoverflow.com/questions/4042192/reduce-left-and-right-margins-in-matplotlib-plot
pp = PdfPages("foo.pdf")
pp.savefig(fig, bbox_inches='tight')
pp.close()

参考：

关于这两个我觉得有用的例子：

与ipynb保存在同一文件夹中的简单CSS代码：

/*包括交替的灰色和白色以及悬停颜色*/
mystyle先生{
字号：11pt；
字体系列：Arial；
边界塌陷：塌陷；
边框：1px纯银；
}
mystyle td，th{
填充物：5px；
}
第n个孩子（偶数）{
背景：#e0；
}
.mystyle tr:悬停{
背景：银；
光标：指针；
}

python代码：

pdf\u filepath=os.path.join（文件夹，文件\u pdf）
demo_df=pd.DataFrame（np.random.random（（10,3）），列=（“第1列”、“第2列”、“第3列”））
table=demo_df.to_html（class='mystyle'）
html_字符串=f''
带有CSS的HTML数据框架
{table}
'''
HTML（string=HTML\u string）。编写\u pdf（pdf\u文件路径，样式表=[“df\u style.css”]）

pdfkit不适用于Windows 64。太好了！在mac上安装Pdfkit:pip安装Pdfkit&brew安装Caskroom/cask/wkhtmltopdfDo您知道如何强制分页符吗？假设我有一个pandas数据框的几个表切片，我希望每个表都从一个新页面开始。这可能吗？我应该在什么时候编辑html代码？谢谢！如何使它以横向/不同的页面大小打印？

.to_latex（）

拯救了我的生命…我认为一个解决方案是将中间步骤转换为HTML，然后标记（甚至没有标准规范），然后转换为pdf，这不是一个好方法。您现在可以使用来完全避免HTML。最后一行中的HTML是什么？HTML在python代码中生成为字符串。我不是100%确定你的问题是什么意思？HTML是从python的“weasyprint”模块导入的-与LaTeX或troff相比，通过matplotlib创建的这些表看起来不太好。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

df = pd.DataFrame(np.random.random((10,3)), columns = ("col 1", "col 2", "col 3"))

#https://stackoverflow.com/questions/32137396/how-do-i-plot-only-a-table-in-matplotlib
fig, ax =plt.subplots(figsize=(12,4))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText=df.values,colLabels=df.columns,loc='center')

#https://stackoverflow.com/questions/4042192/reduce-left-and-right-margins-in-matplotlib-plot
pp = PdfPages("foo.pdf")
pp.savefig(fig, bbox_inches='tight')
pp.close()