C# 我可以使用iTextSharp从现有PDF中删除文本对象并输出到新的PDF吗？_C#_Pdf_Itext

C# 我可以使用iTextSharp从现有PDF中删除文本对象并输出到新的PDF吗？

c# pdf itext

C# 我可以使用iTextSharp从现有PDF中删除文本对象并输出到新的PDF吗？,c#,pdf,itext,C#,Pdf,Itext,这个问题是我以前问题的另一个版本：原始问题我正在开发一个程序，使用iTextSharp将PDF转换为PPTX，具体原因如下。到目前为止，我所做的是获取所有文本对象、图像对象和位置。但我觉得很难得到没有文本的矢量图（比如表格）。事实上，如果我能把它们做成图片就更好了。我的计划是将除文本对象以外的所有对象合并为背景图像，并将文本对象放置在适当的位置。我试图在这里找到类似的问题，但到目前为止运气不佳。如果有人知道如何做这项工作，请回答。谢谢我已经阅读了许多相关的问题和讨论，并决定在这里问另一

这个问题是我以前问题的另一个版本：

原始问题我正在开发一个程序，使用iTextSharp将PDF转换为PPTX，具体原因如下。到目前为止，我所做的是获取所有文本对象、图像对象和位置。但我觉得很难得到没有文本的矢量图（比如表格）。事实上，如果我能把它们做成图片就更好了。我的计划是将除文本对象以外的所有对象合并为背景图像，并将文本对象放置在适当的位置。我试图在这里找到类似的问题，但到目前为止运气不佳。如果有人知道如何做这项工作，请回答。谢谢

我已经阅读了许多相关的问题和讨论，并决定在这里问另一个版本。我还有两个计划，如下所示。如果iText开发人员/专家能为我提供指导，我将不胜感激

我用于获取文本/图像对象的代码段

公共类MyLocationTextExtractionStrategy:IExtRenderListener、ITextractionStrategy、IElementListener
{
//正文
公共列表myPoints_txt=新列表（）；
公共列表myPoints_img=新列表（）；
public FieldInfo GsField=typeof（TextRenderInfo）.GetField（“gs”，System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance）；
public FieldInfo MarkedContentInfosField=typeof（TextRenderInfo）.GetField（“markedContentInfos”，System.Reflection.bindingsflags.NonPublic | System.Reflection.bindingsflags.Instance）；
public FieldInfo MarkedContentInfoTagField=typeof（MarkedContentInfo）.GetField（“tag”，System.Reflection.bindingsflags.NonPublic | System.Reflection.bindingsflags.Instance）；
PdfName EMBEDDED_DOCUMENT=新PdfName（“EMBEDDED DOCUMENT”）；
//形象
公共列表图像=新列表（）；
public List ImageNames=新列表（）；
公共布尔添加（IEElement元素）
{
元素=元素；
返回true；
}
public void BeginTextBlock（）
{
}
公共无效ClipPath（整数规则）
{
}
public void EndTextBlock（）
{
}
公共字符串GetResultantText（）
{
返回“”；
}
公共void修改路径（PathConstructionRenderInfo renderInfo）
{
// ****************************************
//我想这一点我可以得到路径信息
// ****************************************
}
公共无效渲染图像（ImageRenderInfo renderInfo）
{
PdfImageObject image=renderInfo.GetImage（）；
尝试
{ 
image=renderInfo.GetImage（）；
if（image==null）返回；
ImageNames.Add（string.Format(
“Image{0}.{1}”，renderInfo.GetRef（）.Number，Image.GetFileType（）
));
//将图像写入字节
使用（MemoryStream ms=new MemoryStream（image.GetImageAsBytes（））
{
添加（ToArray女士（））；
}
矩阵矩阵=renderInfo.GetImageCTM（）；
这个.myPoints_img.Add（新的RectAndImage（矩阵[matrix.I31]，矩阵[matrix.I32]，矩阵[matrix.I11]，矩阵[matrix.I12]，图像））；
}
捕获（例外e）
{
}
}
public iTextSharp.text.pdf.parser.Path RenderPath（PathPaintingRenderInfo renderInfo）
{
// ****************************************
//我想这一点我可以得到路径信息
// ****************************************
返回null；
}
公共void RenderText（TextRenderInfo renderInfo）
{
DocumentFont\u font=renderInfo.GetFont（）；
LineSegment descentLine=renderInfo.GetDescentLine（）；
线段上升线=renderInfo.GetAscentLine（）；
float x0=descentLine.GetStartPoint（）[0]；
float x1=ascentLine.GetEndPoint（）[0]；
float y0=descentLine.GetStartPoint（）[1]；
float y1=ascentLine.GetEndPoint（）[1]；
矩形rect=新矩形（x0，y0，x1，y1）；
GraphicsState gs=（GraphicsState）GsField.GetValue（renderInfo）；
float fontSize=gs.fontSize；
字符串font\u color=gs.FillColor.ToString（）子字符串（14,6）；
IList markedContentInfos=（IList）MarkedContentInfosField.GetValue（renderInfo）；
if（markedContentInfos！=null&&markedContentInfos.Count>0）
{
foreach（MarkedContentInfo中的MarkedContentInfo）
{
if（EMBEDDED_DOCUMENT.Equals（MarkedContentInfoTagField.GetValue（info）））
回来
}
}
这个.myPoints_txt.Add（新的RectAndText（rect，renderInfo.GetText（），fontSize，renderInfo.GetFont（）.PostscriptFontName，font_color））；
} 
}

新问题 1）我可以从PDF中删除所有文本对象并将其输出到新对象吗？如果是，我可以将输出的所有页面作为图像，并将其用作PPTX的背景。然后我终于可以编写文本了（已经使用上面的代码使用iTextractionStrategy检索到了）

2）如果1）不可能，我将从原始PDF中检索所有路径信息（使用IExtRenderListener），并在新位图上绘制它们。最后，我可以把它作为背景，并把文字/图像放在上面。在这种情况下，使用ModifyPath和RenderPath检索路径信息是正确的方法吗

我知道这可能会有多个问题，但我认为最好在一个线程中编写所有内容，以帮助理解。如果您能给我一些建议或意见，我将不胜感激

我相信@mkl、@Amine、@Bruno Lowagie能帮我。提前感谢。

在中，我解释了那些

IExtRenderListener

回调方法的含义，因此本质上剩下的问题是

1）我可以从PDF中删除所有文本对象并将其输出到新对象吗

您可以从中使用通用内容流编辑器类

PdfContentStreamEditor

。像这样简单地从中派生出来

class TextRemover : PdfContentStreamEditor
{
    protected override void Write(PdfContentStreamProcessor processor, PdfLiteral operatorLit, List<PdfObject> operands)
    {
        if (!TEXT_SHOWING_OPERATORS.Contains(operatorLit.ToString()))
        {
            base.Write(processor, operatorLit, operands);
        }
    }
    List<string> TEXT_SHOWING_OPERATORS = new List<string> { "Tj", "'", "\"", "TJ" };
}

using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write), (char)0, true))
{
    pdfStamper.RotateContents = false;
    PdfContentStreamEditor editor = new TextRemover();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}

class TextRemover:PdfContentStreamEditor
using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write), (char)0, true))
{
    pdfStamper.RotateContents = false;
    PdfContentStreamEditor editor = new TextRemover();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}