Java OpenHTMLToPDF:将自定义字体嵌入到由HTML创建的PDF中

Java OpenHTMLToPDF:将自定义字体嵌入到由HTML创建的PDF中,java,fonts,jsoup,pdfbox,openhtmltopdf,Java,Fonts,Jsoup,Pdfbox,Openhtmltopdf,我使用Jsoup和HTML从HTML创建PDF。我必须在我的PDF中使用不同的字体来覆盖非拉丁字形(请参阅)。如何正确嵌入字体 复制问题的简化程序: src/main/resources/test.html <!DOCTYPE html> <html> <head> <meta charset="UTF-8" /> <title>Font Test</title> &l

我使用Jsoup和HTML从HTML创建PDF。我必须在我的PDF中使用不同的字体来覆盖非拉丁字形(请参阅)。如何正确嵌入字体

复制问题的简化程序: src/main/resources/test.html

<!DOCTYPE html>
<html>
    <head>
        <meta charset="UTF-8" />
        <title>Font Test</title>
        <style>
            @font-face {
                font-family: 'source-sans';
                font-style: normal;
                font-weight: 400;
                src: url(fonts/SourceSansPro-Regular.ttf);
            }
        </style>
    </head>
    <body>    
        <p style="font-family: 'source-sans',serif">Latin Script</p>
        <p style="font-family: 'source-sans',serif">Είμαι ελληνικό κείμενο.</p>
    </body>
</html>
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>paf</groupId>
    <artifactId>test</artifactId>
    <version>1.0-SNAPSHOT</version>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>7</source>
                    <target>7</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>com.openhtmltopdf</groupId>
            <artifactId>openhtmltopdf-pdfbox</artifactId>
            <version>0.0.1-RC18</version>
        </dependency>
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.11.2</version>
        </dependency>
    </dependencies>
</project>
  • 不必担心第二个函数,它只读取HTML文件,并且只包含在这里,以便有一个完整的程序
src/main/resources/font/SourceSansPro-regular.ttf

  • 请在此下载:
pom.xml

<!DOCTYPE html>
<html>
    <head>
        <meta charset="UTF-8" />
        <title>Font Test</title>
        <style>
            @font-face {
                font-family: 'source-sans';
                font-style: normal;
                font-weight: 400;
                src: url(fonts/SourceSansPro-Regular.ttf);
            }
        </style>
    </head>
    <body>    
        <p style="font-family: 'source-sans',serif">Latin Script</p>
        <p style="font-family: 'source-sans',serif">Είμαι ελληνικό κείμενο.</p>
    </body>
</html>
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>paf</groupId>
    <artifactId>test</artifactId>
    <version>1.0-SNAPSHOT</version>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>7</source>
                    <target>7</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>com.openhtmltopdf</groupId>
            <artifactId>openhtmltopdf-pdfbox</artifactId>
            <version>0.0.1-RC18</version>
        </dependency>
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.11.2</version>
        </dependency>
    </dependencies>
</project>
生成的PDF
  • 用衬线字体

编辑1:根据注释中链接的页面进行各种更改,并更新为RC18。新输出现在,但PDF中的字体仍然不正确



编辑2:尝试快速渲染器

好的。感谢@Tilman Hausherr的评论,我在openhtmltopdf的GitHub问题跟踪程序中询问了他们

如果有人对此感兴趣,这些更改使其工作正常:

src/main/java/main.java(仅更改部分,请参见上面的其余部分):

import com.openhtmltopdf.extend.FSSupplier;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;
import org.jsoup.Jsoup;
import org.jsoup.helper.W3CDom;
import org.w3c.dom.Document;

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.util.Objects;

public class main {
    public static void main(String[] args) {
        System.out.println("Starting");

        try {

            final W3CDom w3cDom = new W3CDom();
            final Document w3cDoc = w3cDom.fromJsoup(Jsoup.parse(readFile()));
            final OutputStream outStream = new FileOutputStream("test.pdf");

            final PdfRendererBuilder pdfBuilder = new PdfRendererBuilder();
            pdfBuilder.useFastMode();
            pdfBuilder.withW3cDocument(w3cDoc, "/");
            pdfBuilder.useFont(new File(main.class.getClassLoader().getResource("fonts/SourceSansPro-Regular.ttf").getFile()), "source-sans");
            pdfBuilder.toStream(outStream);

            pdfBuilder.run();
            outStream.close();

        } catch (Exception e) {
            System.out.println("PDF could not be created: " + e.getMessage());
        }

        System.out.println("Finish.");
    }


    private static String readFile() throws IOException {
        final ClassLoader classLoader = main.class.getClassLoader();
        final InputStream inputStream = classLoader.getResourceAsStream("test.html");
        final StringBuilder sb = new StringBuilder();
        final Reader r = new InputStreamReader(Objects.requireNonNull(inputStream), StandardCharsets.UTF_8);
        char[] buf = new char[1024];
        int amt = r.read(buf);
        while(amt > 0) {
            sb.append(buf, 0, amt);
            amt = r.read(buf);
        }
        return sb.toString();
    }
}
    public static void main(String[] args) {
        System.out.println("Starting");

        try {

            final W3CDom w3cDom = new W3CDom();
            final Document w3cDoc = w3cDom.fromJsoup(Jsoup.parse(readFile()));
            final OutputStream outStream = new FileOutputStream("test.pdf");

            final PdfRendererBuilder pdfBuilder = new PdfRendererBuilder();
            pdfBuilder.useFastMode();
            pdfBuilder.withW3cDocument(w3cDoc, "/");
            pdfBuilder.useFont(new File(main.class.getClassLoader().getResource("fonts/SourceSansPro-Regular.ttf").getFile()), "source-sans");
            pdfBuilder.toStream(outStream);

            pdfBuilder.run();
            outStream.close();

        } catch (Exception e) {
            System.out.println("PDF could not be created: " + e.getMessage());
        }

        System.out.println("Finish.");
    }
src/main/resources/font/SourceSansPro-regular.ttf

  • 已在此处下载较新版本:
来自src/main/resources/test.html(仅更改部分,请参见上面的其余部分)


我认为你应该在openhtmltopdf问题跟踪器中问这个问题(除非他们指示你来这里)。这不是一个真正的PDFBox问题,PDFBox本身可以从2.0.0开始使用unicode字体。也可以看看这里,也许这会有所帮助?并使用最新版本,即0.0.1-RC18,而不是0.0.1-RC12。考虑使用Maven版本插件,谢谢你们两位。我改变了加载字体的方式,并根据链接页面使用TTF版本而不是OTF。我还更新了RC18。输出现在不同了,但仍然不起作用。我想我真的应该在openhtmltopdf的GitHub问题跟踪器中发布这篇文章。main方法有变化吗?显然没有。@Paflow在这个useFont方法中,我如何使用字体作为InputStream Java。
        @font-face {
            font-family: 'source-sans';
            font-style: normal;
            font-weight: 400;
            src: url(fonts/SourceSansPro-Regular.ttf);
            -fs-font-subset: complete-font;
        }