C# ITEXT5波兰字符

C# ITEXT5波兰字符,c#,itextsharp,C#,Itextsharp,我在使用itextSharp时对波兰字符有问题。我想从html创建pdf。一切正常,但缺少波兰人的性格。我使用的函数为: private void createPDF(string html) { //MemoryStream msOutput = new MemoryStream(); TextReader reader = new StringReader(html);// step 1: creation of a document-obje

我在使用itextSharp时对波兰字符有问题。我想从html创建pdf。一切正常,但缺少波兰人的性格。我使用的函数为:

    private void createPDF(string html)
    {
        //MemoryStream msOutput = new MemoryStream();
        TextReader reader = new StringReader(html);// step 1: creation of a document-object
        Document document = new Document(PageSize.A4, 30, 30, 30, 30);

        // step 2:
        // we create a writer that listens to the document
        // and directs a XML-stream to a file
        PdfWriter writer = PdfWriter.GetInstance(document, new FileStream("Test.pdf", FileMode.Create));

        // step 3: we create a worker parse the document
        HTMLWorker worker = new HTMLWorker(document);

        // step 4: we open document and start the worker on the document
        document.Open();
        worker.StartDocument();

        // step 5: parse the html into the document
        worker.Parse(reader);

        // step 6: close the document and the worker
        worker.EndDocument();
        worker.Close();
        document.Close();
    }
并尝试使用它:

createPDF(创建PDF)

我尝试设置:

BaseFont bf=BaseFont.CreateFont(BaseFont.TIMES\u-ROMAN,Encoding.UTF8.HeaderName,BaseFont.EMBEDDED)

但它不起作用

你知道吗


关于

创建基本字体时,需要指定要使用UniCode字符。显示了。

1)iText 5.0.6是如何在今天发布的,它对HTML->PDF转换代码进行了重大修改。我建议你试试新的代码

2) 我几乎可以肯定,这样设置directContent不会影响HTMLWorker生成的pdf内容。我99%确信它会在绘制任何文本之前[重新]设置字体

3) 尝试将字符串包装在
标记中。我严重怀疑HTMLWorker选择的默认字体是否适合该工作

没有。默认设置为Helvetica with WinAnsienceODing。绝对不适合典型英语/德语/法语/西班牙语以外的任何语言

您应该能够使用
HTMLWorker.setStyleSheet
设置一些更友好的默认值。您需要将“脸”和“编码”设置为更为波兰友好的形式。我建议使用“Identity-H”进行编码,它可以访问您使用的字体中的所有字符,而不考虑语言。对于字体,windows WayBack中有一个名为“charmap.exe”的程序,它将显示字体在给定编码(包括unicode)中可用的字符。“Arial”家族看起来不错,其他几个家族也是如此


“新代码”可能不会改变您看到的任何行为。这是一种重构,以使未来(据我所知,下一版本)的更改更容易

我的建议是使用
setStyleSheet()

//步骤3:我们创建一个工作进程来解析文档
HTMLWorker=新的HTMLWorker(文档);
样式表=新样式表;
HashMap styleMap=新的HashMap();
styleMap.put(“face”,“Arial”);//默认字体
styleMap.put(“编码”、“标识-H”);//默认编码
字符串标记[]={p”,“div”,…};
用于(字符串标记:标记){
sheet.applyStyle(标记、样式映射);
}

我不确定,但您可能只需要
applyStyle(“body”,styleMap)
并将其级联到它包含的所有内容中,但我不确定。我也不确定这是否会解决你的单线测试,因为没有涉及标签。IIRC,如果没有尸体标签,我们会建立一个尸体标签,但我一点也不确定。

只是为了把@Mark Storer说的话汇总在一起:

private void createPDF(string html)
{
    //MemoryStream msOutput = new MemoryStream();
    TextReader reader = new StringReader(html);// step 1: creation of a document-object
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);

    // step 2:
    // we create a writer that listens to the document
    // and directs a XML-stream to a file
    PdfWriter writer = PdfWriter.GetInstance(document, new FileStream("Test.pdf", FileMode.Create));

    // step 3: we create a worker parse the document
    HTMLWorker worker = new HTMLWorker(document);

    // step 4: we open document and start the worker on the document
    document.Open();

    // step 4.1: register a unicode font and assign it an allias
    FontFactory.Register("C:\\Windows\\Fonts\\ARIALUNI.TTF", "arial unicode ms");

    // step 4.2: create a style sheet and set the encoding to Identity-H
    iTextSharp.text.html.simpleparser.StyleSheet ST = New iTextSharp.text.html.simpleparser.StyleSheet();
    ST.LoadTagStyle("body", "encoding", "Identity-H");

    // step 4.3: assign the style sheet to the html parser
    worker.Style = ST;

    worker.StartDocument();

    // step 5: parse the html into the document
    worker.Parse(reader);

    // step 6: close the document and the worker
    worker.EndDocument();
    worker.Close();
    document.Close();
}
当您调用它时,请使用您在上面注册的名称将文本包装成字体:

createPDF("<font face=""arial unicode ms"">ĄąćęĘłŁŃńóÓŚśŹźŻż</font>");
createPDF(“ĄąćęłŃÓŚśźż”);
我得到了答案!=)(特别针对波兰人)我觉得有义务把它放在这条旧线里,因为我肯定我不会是最后一个找到它的人

我非常失望,没有任何好的答案。。。大多数人建议在Windows字体文件夹中使用ARIALUNI.TTF,这会导致PDF文件大很多倍。解决方案不需要如此激烈

许多其他人建议使用cp1252进行编码的示例,cp1252在Arial上失败,并且不能与Helvetica一起处理波兰语文本

我正在使用iTextSharp 4.1.6。。。诀窍是。。。cp1257!您可以将它与BaseFont.Courier、BaseFont.Helvetica、BaseFont.Times-Roman一起使用

这很有效。。。我的PDF文件很小(3kb!)


稍后我将进行测试,确保除了Windows 7之外,我还可以在Windows XP和Mac OSX中打开并阅读这些内容。

当我浏览各种论坛和stackoverflow问题时,我没有找到关于特殊字符问题的复杂解决方案的答案。我试图提供一个答案,以换取对这个问题的相当长的答复。希望这能帮到别人

我使用了来自的
XMLWorker
,因为
HtmlWorker
变得无润滑。关于特殊人物的问题仍有待思考。我发现了两个实际可行的解决方案,可以单独使用,也可以组合使用

HTML和CSS解决方案 所涉及的每个标记都需要指定字体族样式,以便通过
ParseXHtml
方法正确解释(我不确定嵌套标记样式继承在这里为什么不起作用,但它似乎真的不起作用或不完全起作用)

此解决方案允许仅基于HTML代码修改生成的PDF,因此可能会出现一些没有代码重新编译的情况

简化代码(对于MVC应用程序)如下所示:

控制器: CSS: 视图布局 控制器 CSS: 视图布局

@视图包。标题
@RenderBody()
查看页面
@{
ViewBag.Title=“PDF页面标题”
}
@视图包。标题

ěščřžýáíéů ĚŠČŘŽÝÁÍÉŮ

其他有用(再)资源:


也许你有这样的例子:1)iText 5.0.6今天发布,对HTML->PDF转换代码进行了重大修改。我建议你试试新代码。看起来不错,是的。你应该能够以同样的方式设置默认面。克里斯·哈斯-你是最好的。经过几个小时的挣扎,你的回答给了我所需要的突破!只是更新一下这个。。。坚持使用Helvetica或Times New Roman和CP1257。我使用的是courier,结果发现一些旧版本的adobe reader没有显示带波兰口音的s和z。您是如何获得BaseFont.CP1257的?我正在寻找克罗地亚/波斯尼亚的正确代码。你的代码部分支持我的信件。感谢您可能正在使用不同版本的itextsharp?我使用4.1.6是因为版本5有不同的
private void createPDF(string html)
{
    //MemoryStream msOutput = new MemoryStream();
    TextReader reader = new StringReader(html);// step 1: creation of a document-object
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);

    // step 2:
    // we create a writer that listens to the document
    // and directs a XML-stream to a file
    PdfWriter writer = PdfWriter.GetInstance(document, new FileStream("Test.pdf", FileMode.Create));

    // step 3: we create a worker parse the document
    HTMLWorker worker = new HTMLWorker(document);

    // step 4: we open document and start the worker on the document
    document.Open();

    // step 4.1: register a unicode font and assign it an allias
    FontFactory.Register("C:\\Windows\\Fonts\\ARIALUNI.TTF", "arial unicode ms");

    // step 4.2: create a style sheet and set the encoding to Identity-H
    iTextSharp.text.html.simpleparser.StyleSheet ST = New iTextSharp.text.html.simpleparser.StyleSheet();
    ST.LoadTagStyle("body", "encoding", "Identity-H");

    // step 4.3: assign the style sheet to the html parser
    worker.Style = ST;

    worker.StartDocument();

    // step 5: parse the html into the document
    worker.Parse(reader);

    // step 6: close the document and the worker
    worker.EndDocument();
    worker.Close();
    document.Close();
}
createPDF("<font face=""arial unicode ms"">ĄąćęĘłŁŃńóÓŚśŹźŻż</font>");
document.Open();
var bigFont = FontFactory.GetFont(BaseFont.COURIER, BaseFont.CP1257, 18, Font.BOLD);
var para = new Paragraph("Oryginał", bigFont);
document.Add(pgDocType);
document.Close();
public FileStreamResult GetPdf()
{
    const string CONTENT_TYPE = "application/pdf"
    var fileName = "mySimple.pdf";
    var html = GetViewPageHtmlCode();
    //the way how to capture view HTML are described in other threads, e.g. [here][2]
    var css = Server.MapPath("~/Content/Pdf.css");
    using (var capturedActionStream = new MemoryStream(USED_ENCODING.GetBytes(html)))
    {
        using (var cssFile = new FileStream(css),  FileMode.Open))
        {
            var memoryStream = new MemoryStream();
            //to create landscape, use PageSize.A4.Rotate() for pageSize
            var document = new Document(PageSize.A4, 30, 30, 10, 10);
            var writer = PdfWriter.GetInstance(document, memoryStream);
            var worker = XMLWorkerHelper.GetInstance();

            document.Open();
            worker.ParseXHtml(writer, document, capturedActionStream, cssFile);
            writer.CloseStream = false;
            document.Close();
            memoryStream.Position = 0;

            //to enforce file download
            HttpContext.Response.AddHeader(
                "Content-Disposition",
                String.Format("attachment; filename={0}", fileName));
            var wrappedPdf = new FileStreamResult(memoryStream, CONTENT_TYPE);
            return wrappedPdf;
        }
    }
}
body {
    background-color: white;
    font-size: .85em;
    font-family: Arial;
    margin: 0;
    padding: 0;
    color: black;
}

p, ul {
    margin-bottom: 20px;
    line-height: 1.6em;
}

div, span {
    font-family: Arial;
}

h1, h2, h3, h4, h5, h6 {
    font-size: 1.5em;
    color: #000;
    font-family: Arial;
}
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
        <title>@ViewBag.Title</title>
        <link href="@Url.Content("~/Content/Pdf.css")" rel="stylesheet" type="text/css" />
    </head>
    <body>
        <div class="page">
            <div id="main">
                @RenderBody()
            </div>
        </div>
    </body>
    </html>
@{
    ViewBag.Title = "PDF page title"
}

<h1>@ViewBag.Title</h1>

<p>
    ěščřžýáíéů ĚŠČŘŽÝÁÍÉŮ
</p>
public class CustomFontFactory : FontFactoryImp
{
    public const Single DEFAULT_FONT_SIZE = 12;
    public const Int32 DEFAULT_FONT_STYLE = 0;
    public static readonly BaseColor DEFAULT_FONT_COLOR = BaseColor.BLACK;

    public String DefaultFontPath { get; private set; }
    public String DefaultFontEncoding { get; private set; }
    public Boolean DefaultFontEmbedding { get; private set; }
    public Single DefaultFontSize { get; private set; }
    public Int32 DefaultFontStyle { get; private set; }
    public BaseColor DefaultFontColor { get; private set; }

    public Boolean ReplaceEncodingWithDefault { get; set; }
    public Boolean ReplaceEmbeddingWithDefault { get; set; }
    public Boolean ReplaceFontWithDefault { get; set; }
    public Boolean ReplaceSizeWithDefault { get; set; }
    public Boolean ReplaceStyleWithDefault { get; set; }
    public Boolean ReplaceColorWithDefault { get; set; }

    public BaseFont DefaultBaseFont { get; protected set; }

    public CustomFontFactory(
        String defaultFontFilePath,
        String defaultFontEncoding = BaseFont.IDENTITY_H,
        Boolean defaultFontEmbedding = BaseFont.EMBEDDED,
        Single? defaultFontSize = null,
        Int32? defaultFontStyle = null,
        BaseColor defaultFontColor = null,
        Boolean automaticalySetReplacementForNullables = true)
    {
        //set default font properties
        DefaultFontPath =  defaultFontFilePath;
        DefaultFontEncoding = defaultFontEncoding;
        DefaultFontEmbedding = defaultFontEmbedding;
        DefaultFontColor = defaultFontColor == null
            ? DEFAULT_FONT_COLOR
            : defaultFontColor;
        DefaultFontSize = defaultFontSize.HasValue
            ? defaultFontSize.Value
            : DEFAULT_FONT_SIZE;
        DefaultFontStyle = defaultFontStyle.HasValue
            ? defaultFontStyle.Value
            : DEFAULT_FONT_STYLE;

        //set default replacement options
        ReplaceFontWithDefault = false;
        ReplaceEncodingWithDefault = true;
        ReplaceEmbeddingWithDefault = false;

        if (automaticalySetReplacementForNullables)
        {
            ReplaceSizeWithDefault = defaultFontSize.HasValue;
            ReplaceStyleWithDefault = defaultFontStyle.HasValue;
            ReplaceColorWithDefault = defaultFontColor != null;
        }

        //define default font
        DefaultBaseFont = BaseFont.CreateFont(DefaultFontPath, DefaultFontEncoding, DefaultFontEmbedding);

        //register system fonts
        FontFactory.RegisterDirectories();
    }

    protected Font GetBaseFont(Single size, Int32 style, BaseColor color)
    {
        var baseFont = new Font(DefaultBaseFont, size, style, color);

        return baseFont;
    }

    public override Font GetFont(String fontname, String encoding, Boolean embedded, Single size, Int32 style, BaseColor color, Boolean cached)
    {
        //eventually replace expected font properties
        size = ReplaceSizeWithDefault
            ? DefaultFontSize
            : size;
        style = ReplaceStyleWithDefault
            ? DefaultFontStyle
            : style;
        encoding = ReplaceEncodingWithDefault
            ? DefaultFontEncoding
            : encoding;
        embedded = ReplaceEmbeddingWithDefault
            ? DefaultFontEmbedding
            : embedded;

        //get font
        Font font = null;
        if (ReplaceFontWithDefault)
        {
            font = GetBaseFont(
                size,
                style,
                color);
        }
        else
        {
            font = FontFactory.GetFont(
                fontname,
                encoding,
                embedded,
                size,
                style,
                color,
                cached);

            if (font.BaseFont == null)
                font = GetBaseFont(
                    size,
                    style,
                    color);
        }

        return font;
    }
}
private const String DEFAULT_FONT_LOCATION = "~/Content/Fonts";
private const String DEFAULT_FONT_NAME = "arialn.ttf";

public FileStreamResult GetPdf()
{
    const string CONTENT_TYPE = "application/pdf"
    var fileName = "mySimple.pdf";
    var html = GetViewPageHtmlCode();
    //the way how to capture view HTML are described in other threads, e.g. 
    var css = Server.MapPath("~/Content/Pdf.css");
    using (var capturedActionStream = new MemoryStream(USED_ENCODING.GetBytes(html)))
    {
        using (var cssFile = new FileStream(css),  FileMode.Open))
        {
            var memoryStream = new MemoryStream();
            var document = new Document(PageSize.A4, 30, 30, 10, 10);
            //to create landscape, use PageSize.A4.Rotate() for pageSize
            var writer = PdfWriter.GetInstance(document, memoryStream);
            var worker = XMLWorkerHelper.GetInstance();
            var defaultFontPath = Server
                .MapPath(Path
                    .Combine(
                        DEFAULT_FONT_LOCATION,
                        DEFAULT_FONT_NAME));
            var fontProvider = new CustomFontFactory(defaultFontPath);

            document.Open();
            worker.ParseXHtml(writer, document, capturedActionStream, cssFile, fontProvider);
            writer.CloseStream = false;
            document.Close();
            memoryStream.Position = 0;

            //to enforce file download
            HttpContext.Response.AddHeader(
                "Content-Disposition",
                String.Format("attachment; filename={0}", fileName));
            var wrappedPdf = new FileStreamResult(memoryStream, CONTENT_TYPE);
            return wrappedPdf;
        }
    }
}
body {
    background-color: white;
    font-size: .85em;
    font-family: "Trebuchet MS", Verdana, Helvetica, Sans-Serif;
    margin: 0;
    padding: 0;
    color: black;
}

p, ul {
    margin-bottom: 20px;
    line-height: 1.6em;
}

h1, h2, h3, h4, h5, h6 {
    font-size: 1.5em;
    color: #000;
}
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
        <title>@ViewBag.Title</title>
        <link href="@Url.Content("~/Content/Pdf.css")" rel="stylesheet" type="text/css" />
    </head>
    <body>
        <div class="page">
            <div id="main">
                @RenderBody()
            </div>
        </div>
    </body>
    </html>
@{
    ViewBag.Title = "PDF page title"
}

<h1>@ViewBag.Title</h1>

<p>
    ěščřžýáíéů ĚŠČŘŽÝÁÍÉŮ
</p>