itextpdf编校:完全删除部分编校的文本字符串
使用了itextpdf-5.5.9和itext-xtra-5.5.9 我正在尝试对部分文本字符串应用编校,但在应用编校后,整个字符串将从文档中删除。请查看附件中的屏幕截图itextpdf编校:完全删除部分编校的文本字符串,itext,Itext,使用了itextpdf-5.5.9和itext-xtra-5.5.9 我正在尝试对部分文本字符串应用编校,但在应用编校后,整个字符串将从文档中删除。请查看附件中的屏幕截图 PdfReader reader = new PdfReader(src); PdfCleanUpProcessor cleaner= null; PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(targetPdf)); stamper.setR
PdfReader reader = new PdfReader(src);
PdfCleanUpProcessor cleaner= null;
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(targetPdf));
stamper.setRotateContents(false);
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
Rectangle rectangle = new Rectangle(380, 640, 430, 665);
cleanUpLocations.add(new PdfCleanUpLocation(1, rectangle, BaseColor.BLACK));
cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
PdfReader阅读器=新的PdfReader(src);
PdfCleanUpProcessor cleaner=null;
PdfStamper stamper=新PdfStamper(读取器,新文件输出流(targetPdf));
母版设置旋转内容(假);
List cleanUpLocations=new ArrayList();
矩形=新矩形(380640430665);
添加(新的PdfCleanUpLocation(1,矩形,BaseColor.BLACK));
清洁剂=新的PdfCleanUpProcessor(清洁剂位置、压模);
cleaner.cleanUp();
压模关闭();
reader.close();
OP在评论中澄清,他确实希望编校仅删除完全包含在编校区域中的文本;不过,其边界框甚至部分位于该区域之外的文本预计将保留
就编校的安全性而言,这种期望可能是不明智的,因为这样一来,由于彩色编校区域可能仍然保留在可用于文本提取或甚至简单复制粘贴的PDF内容中,临时编校者就看不到文本了
尽管有这样的保留,但如果仍希望调整PdfCleanup
以使其像OP所期望的那样工作,则基本上只需更改pdfcleanupurprocessor
使用的pdfcleanupergionfilter
:默认情况下使用的过滤器实现拒绝字形(并将其标记为删除)如果其边界框与编校区域相交。为了满足OP的期望,必须通过检查边界框是否完全包含在编校区域中来替换此行为
这听起来很简单。不幸的是,它并不像听起来那么简单,因为清理代码不是为轻松替换区域过滤器实现而设计的,许多相关对象或方法都是私有的,或者最多是受包保护的
因此,为了实现OP所需的行为,我只需将com.itextpdf.text.pdf
包中的所有类复制到自己的包中,其中添加了一个新的过滤器类,该过滤器类是从我的PdfCleanUpRegionFilter
副本中派生出来的,使用上述不同的文本拒绝算法,然后将PdfCleanUpProcessor
的副本更改为使用此其他筛选器类:
/**
* In contrast to the base class {@link PdfCleanUpRegionFilter}, this filter
* only rejects text <b>completely</b> inside the redaction zone. The original
* also rejects text located merely <b>partially</b> inside the redaction zone.
*/
public class StrictPdfCleanUpRegionFilter extends PdfCleanUpRegionFilter
{
public StrictPdfCleanUpRegionFilter(List<Rectangle> rectangles)
{
super(rectangles);
this.rectangles = rectangles;
}
/**
* Checks if the text is COMPLETELY inside render filter region.
*/
@Override
public boolean allowText(TextRenderInfo renderInfo) {
LineSegment ascent = renderInfo.getAscentLine();
LineSegment descent = renderInfo.getDescentLine();
Point2D[] glyphRect = new Point2D[] {
new Point2D.Float(ascent.getStartPoint().get(0), ascent.getStartPoint().get(1)),
new Point2D.Float(ascent.getEndPoint().get(0), ascent.getEndPoint().get(1)),
new Point2D.Float(descent.getEndPoint().get(0), descent.getEndPoint().get(1)),
new Point2D.Float(descent.getStartPoint().get(0), descent.getStartPoint().get(1)),
};
for (Rectangle rectangle : rectangles)
{
boolean glyphInRectangle = true;
for (Point2D point2d : glyphRect)
{
glyphInRectangle &= rectangle.getLeft() <= point2d.getX();
glyphInRectangle &= point2d.getX() <= rectangle.getRight();
glyphInRectangle &= rectangle.getBottom() <= point2d.getY();
glyphInRectangle &= point2d.getY() <= rectangle.getTop();
}
if (glyphInRectangle)
return false;
}
return true;
}
List<Rectangle> rectangles;
}
(试验方法testredactstrictformakpandey
)
OP提供的示例PDF
使用原始类进行编辑后
使用经过调整的类进行编辑后
对经过调整的类进行健全性检查
为了确保经过调整的类仍然删除了所有文本,我扩大了编校区域,以便“文档提交表”的最后一个字符“heet”完全包含在编校区域中:
try ( InputStream resource = getClass().getResourceAsStream("Document.pdf");
OutputStream result = new FileOutputStream(new File(OUTPUTDIR, "Document-redacted-strict-large.pdf")) )
{
PdfReader reader = new PdfReader(resource);
StrictPdfCleanUpProcessor cleaner= null;
PdfStamper stamper = new PdfStamper(reader, result);
stamper.setRotateContents(false);
List<mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation> cleanUpLocations = new ArrayList<>();
Rectangle rectangle = new Rectangle(380, 640, 430, 680);
cleanUpLocations.add(new mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation(1, rectangle, BaseColor.BLACK));
cleaner = new StrictPdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
}
try(InputStream resource=getClass().getResourceAsStream(“Document.pdf”);
OutputStream结果=新文件OutputStream(新文件(OUTPUTDIR,“Document redacted strict large.pdf”))
{
PdfReader reader=新PdfReader(资源);
StrictPdfCleanUpProcessor cleaner=null;
PdfStamper压模=新PdfStamper(读卡器,结果);
母版设置旋转内容(假);
List cleanUpLocations=new ArrayList();
矩形=新矩形(380640430680);
添加(new mkl.testrea.itext5.pdfcleanup.PdfCleanUpLocation(1,矩形,BaseColor.BLACK));
清洁剂=新的StrictPdfCleanUpProcessor(清洁剂位置、压模);
cleaner.cleanUp();
压模关闭();
reader.close();
}
(试验方法testredactstrictformakpandeylarge
)
事实上,复制和粘贴(以及其他文本提取方法)现在只渲染文本
“Document Submission S”。iText redaction删除所有文本图示符,即使它们仅被编校区域部分覆盖,即使它们的边界框的一部分而不是实际图示符被覆盖。另外,通常一个屏幕截图不足以解决问题,代码和PDF是必需的。@mkl,请查找共享链接您是否是iText客户,您是否也在iText JIRA中询问过这一问题?@MayankPandey您的示例代码和文件确实是iText编校的一个示例,删除了编校区域仅部分覆盖的所有内容。您的期望是什么?是否只删除完全覆盖的内容?或者只包含至少一半的内容?很可能PdfCleanup可以相应地进行调整。但请注意,在这种情况下,几乎未完全覆盖的内容可以再次显示,这可能不是您希望的安全性方面…@mkl,我们的期望是仅删除所提供坐标覆盖的区域。
try ( InputStream resource = getClass().getResourceAsStream("Document.pdf");
OutputStream result = new FileOutputStream(new File(OUTPUTDIR, "Document-redacted-strict.pdf")) )
{
PdfReader reader = new PdfReader(resource);
StrictPdfCleanUpProcessor cleaner= null;
PdfStamper stamper = new PdfStamper(reader, result);
stamper.setRotateContents(false);
List<mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation> cleanUpLocations = new ArrayList<>();
Rectangle rectangle = new Rectangle(380, 640, 430, 665);
cleanUpLocations.add(new mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation(1, rectangle, BaseColor.BLACK));
cleaner = new StrictPdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
}
try ( InputStream resource = getClass().getResourceAsStream("Document.pdf");
OutputStream result = new FileOutputStream(new File(OUTPUTDIR, "Document-redacted-strict-large.pdf")) )
{
PdfReader reader = new PdfReader(resource);
StrictPdfCleanUpProcessor cleaner= null;
PdfStamper stamper = new PdfStamper(reader, result);
stamper.setRotateContents(false);
List<mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation> cleanUpLocations = new ArrayList<>();
Rectangle rectangle = new Rectangle(380, 640, 430, 680);
cleanUpLocations.add(new mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation(1, rectangle, BaseColor.BLACK));
cleaner = new StrictPdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
}