C# 如何使用iTextSharp从PDF中正确提取下标/上标？_C#_Pdf_Itextsharp_Itext

C# 如何使用iTextSharp从PDF中正确提取下标/上标？

c# pdf itext

C# 如何使用iTextSharp从PDF中正确提取下标/上标？,c#,pdf,itextsharp,itext,C#,Pdf,Itextsharp,Itext,iTextSharp可以很好地从PDF文档中提取纯文本，但我在处理技术文档中常见的下标/上标文本时遇到了问题 textcunk.SameLine（）要求两个块具有相同的垂直位置，以便“在”同一行上，而上标或下标文本则不是这样。例如，在本文件第11页的“燃烧效率”下：预期文本： monoxide (CO) in flue gas in accordance with the following formula: C.E. = [CO2 /(CO + CO2)] 结果文本： monoxide

iTextSharp可以很好地从PDF文档中提取纯文本，但我在处理技术文档中常见的下标/上标文本时遇到了问题

textcunk.SameLine（）

要求两个块具有相同的垂直位置，以便“在”同一行上，而上标或下标文本则不是这样。例如，在本文件第11页的“燃烧效率”下：

预期文本：

monoxide (CO) in flue gas in accordance with the following formula: C.E. = [CO2 /(CO + CO2)]

结果文本：

monoxide (CO) in flue gas in accordance with the following formula: C.E. = [CO /(CO + CO )] 
2 2

我将

SameLine（）

移动到

LocationTextExtractionStrategy

，并公开了它读取的私有

TextChunk

属性的getter。这允许我在自己的子类中动态调整公差，如下所示：

public class SubSuperStrategy : LocationTextExtractionStrategy {
  public int SameLineOrientationTolerance { get; set; }
  public int SameLineDistanceTolerance { get; set; }

  public override bool SameLine(TextChunk chunk1, TextChunk chunk2) {
    var orientationDelta = Math.Abs(chunk1.OrientationMagnitude
       - chunk2.OrientationMagnitude);
    if(orientationDelta > SameLineOrientationTolerance) return false;
    var distDelta = Math.Abs(chunk1.DistPerpendicular
       - chunk2.DistPerpendicular);
    return (distDelta <= SameLineDistanceTolerance);
    }
}

有时，块插入文本中间某处，有时（如本例中一样）结尾。不管怎样，他们都不会在正确的地方结束。我怀疑这可能与字体大小有关，但我对这段代码的本质理解有限

有没有人找到另一种方法来解决这个问题

（如果有帮助的话，我很乐意提交一个包含我的更改的pull请求。）

要正确提取行中的这些下标和上标，需要一种不同的方法来检查两个文本块是否在同一行。下面的类代表一种这样的方法

我更熟悉Java/iText；因此，我首先在Java中实现了这种方法，然后才将其翻译成C#/iTextSharp

一种使用Java和iText的方法我正在使用当前的开发分支iText 5.5.8-SNAPSHOT

一种识别线条的方法假设文本行是水平的，并且不同行上的glyph边界框的垂直延伸不重叠，可以尝试使用

RenderListener

来识别行，如下所示：

public class TextLineFinder implements RenderListener
{
    @Override
    public void beginTextBlock() { }
    @Override
    public void endTextBlock() { }
    @Override
    public void renderImage(ImageRenderInfo renderInfo) { }

    /*
     * @see RenderListener#renderText(TextRenderInfo)
     */
    @Override
    public void renderText(TextRenderInfo renderInfo)
    {
        LineSegment ascentLine = renderInfo.getAscentLine();
        LineSegment descentLine = renderInfo.getDescentLine();
        float[] yCoords = new float[]{
                ascentLine.getStartPoint().get(Vector.I2),
                ascentLine.getEndPoint().get(Vector.I2),
                descentLine.getStartPoint().get(Vector.I2),
                descentLine.getEndPoint().get(Vector.I2)
        };
        Arrays.sort(yCoords);
        addVerticalUseSection(yCoords[0], yCoords[3]);
    }

    /**
     * This method marks the given interval as used.
     */
    void addVerticalUseSection(float from, float to)
    {
        if (to < from)
        {
            float temp = to;
            to = from;
            from = temp;
        }

        int i=0, j=0;
        for (; i<verticalFlips.size(); i++)
        {
            float flip = verticalFlips.get(i);
            if (flip < from)
                continue;

            for (j=i; j<verticalFlips.size(); j++)
            {
                flip = verticalFlips.get(j);
                if (flip < to)
                    continue;
                break;
            }
            break;
        }
        boolean fromOutsideInterval = i%2==0;
        boolean toOutsideInterval = j%2==0;

        while (j-- > i)
            verticalFlips.remove(j);
        if (toOutsideInterval)
            verticalFlips.add(i, to);
        if (fromOutsideInterval)
            verticalFlips.add(i, from);
    }

    final List<Float> verticalFlips = new ArrayList<Float>();
}

public class HorizontalTextExtractionStrategy extends LocationTextExtractionStrategy
{
    public class HorizontalTextChunk extends TextChunk
    {
        public HorizontalTextChunk(String string, Vector startLocation, Vector endLocation, float charSpaceWidth)
        {
            super(string, startLocation, endLocation, charSpaceWidth);
        }

        @Override
        public int compareTo(TextChunk rhs)
        {
            if (rhs instanceof HorizontalTextChunk)
            {
                HorizontalTextChunk horRhs = (HorizontalTextChunk) rhs;
                int rslt = Integer.compare(getLineNumber(), horRhs.getLineNumber());
                if (rslt != 0) return rslt;
                return Float.compare(getStartLocation().get(Vector.I1), rhs.getStartLocation().get(Vector.I1));
            }
            else
                return super.compareTo(rhs);
        }

        @Override
        public boolean sameLine(TextChunk as)
        {
            if (as instanceof HorizontalTextChunk)
            {
                HorizontalTextChunk horAs = (HorizontalTextChunk) as;
                return getLineNumber() == horAs.getLineNumber();
            }
            else
                return super.sameLine(as);
        }

        public int getLineNumber()
        {
            Vector startLocation = getStartLocation();
            float y = startLocation.get(Vector.I2);
            List<Float> flips = textLineFinder.verticalFlips;
            if (flips == null || flips.isEmpty())
                return 0;
            if (y < flips.get(0))
                return flips.size() / 2 + 1;
            for (int i = 1; i < flips.size(); i+=2)
            {
                if (y < flips.get(i))
                {
                    return (1 + flips.size() - i) / 2;
                }
            }
            return 0;
        }
    }

    @Override
    public void renderText(TextRenderInfo renderInfo)
    {
        textLineFinder.renderText(renderInfo);

        LineSegment segment = renderInfo.getBaseline();
        if (renderInfo.getRise() != 0){ // remove the rise from the baseline - we do this because the text from a super/subscript render operations should probably be considered as part of the baseline of the text the super/sub is relative to 
            Matrix riseOffsetTransform = new Matrix(0, -renderInfo.getRise());
            segment = segment.transformBy(riseOffsetTransform);
        }
        TextChunk location = new HorizontalTextChunk(renderInfo.getText(), segment.getStartPoint(), segment.getEndPoint(), renderInfo.getSingleSpaceWidth());
        getLocationalResult().add(location);        
    }

    public HorizontalTextExtractionStrategy() throws NoSuchFieldException, SecurityException
    {
        locationalResultField = LocationTextExtractionStrategy.class.getDeclaredField("locationalResult");
        locationalResultField.setAccessible(true);

        textLineFinder = new TextLineFinder();
    }

    @SuppressWarnings("unchecked")
    List<TextChunk> getLocationalResult()
    {
        try
        {
            return (List<TextChunk>) locationalResultField.get(this);
        }
        catch (IllegalArgumentException | IllegalAccessException e)
        {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }

    final Field locationalResultField;
    final TextLineFinder textLineFinder;
}

String extract(PdfReader reader, int pageNo) throws IOException, NoSuchFieldException, SecurityException
{
    return PdfTextExtractor.getTextFromPage(reader, pageNo, new HorizontalTextExtractionStrategy());
}

monoxide (CO) in flue gas in accordance with the following formula:   C.E. = [CO 2/(CO + CO 2 )]

（）

此

TextExtractionStrategy

使用

TextLineFinder

识别水平文本行，然后使用这些信息对文本块进行排序

注意，此代码使用反射来访问私有父类成员。这可能不是所有环境都允许的。在这种情况下，只需复制

LocationTextExtractionStrategy

并直接插入代码即可

提取文本现在，您可以使用此文本提取策略提取具有内联上标和下标的文本，如下所示：

public class TextLineFinder implements RenderListener
{
    @Override
    public void beginTextBlock() { }
    @Override
    public void endTextBlock() { }
    @Override
    public void renderImage(ImageRenderInfo renderInfo) { }

    /*
     * @see RenderListener#renderText(TextRenderInfo)
     */
    @Override
    public void renderText(TextRenderInfo renderInfo)
    {
        LineSegment ascentLine = renderInfo.getAscentLine();
        LineSegment descentLine = renderInfo.getDescentLine();
        float[] yCoords = new float[]{
                ascentLine.getStartPoint().get(Vector.I2),
                ascentLine.getEndPoint().get(Vector.I2),
                descentLine.getStartPoint().get(Vector.I2),
                descentLine.getEndPoint().get(Vector.I2)
        };
        Arrays.sort(yCoords);
        addVerticalUseSection(yCoords[0], yCoords[3]);
    }

    /**
     * This method marks the given interval as used.
     */
    void addVerticalUseSection(float from, float to)
    {
        if (to < from)
        {
            float temp = to;
            to = from;
            from = temp;
        }

        int i=0, j=0;
        for (; i<verticalFlips.size(); i++)
        {
            float flip = verticalFlips.get(i);
            if (flip < from)
                continue;

            for (j=i; j<verticalFlips.size(); j++)
            {
                flip = verticalFlips.get(j);
                if (flip < to)
                    continue;
                break;
            }
            break;
        }
        boolean fromOutsideInterval = i%2==0;
        boolean toOutsideInterval = j%2==0;

        while (j-- > i)
            verticalFlips.remove(j);
        if (toOutsideInterval)
            verticalFlips.add(i, to);
        if (fromOutsideInterval)
            verticalFlips.add(i, from);
    }

    final List<Float> verticalFlips = new ArrayList<Float>();
}

public class HorizontalTextExtractionStrategy extends LocationTextExtractionStrategy
{
    public class HorizontalTextChunk extends TextChunk
    {
        public HorizontalTextChunk(String string, Vector startLocation, Vector endLocation, float charSpaceWidth)
        {
            super(string, startLocation, endLocation, charSpaceWidth);
        }

        @Override
        public int compareTo(TextChunk rhs)
        {
            if (rhs instanceof HorizontalTextChunk)
            {
                HorizontalTextChunk horRhs = (HorizontalTextChunk) rhs;
                int rslt = Integer.compare(getLineNumber(), horRhs.getLineNumber());
                if (rslt != 0) return rslt;
                return Float.compare(getStartLocation().get(Vector.I1), rhs.getStartLocation().get(Vector.I1));
            }
            else
                return super.compareTo(rhs);
        }

        @Override
        public boolean sameLine(TextChunk as)
        {
            if (as instanceof HorizontalTextChunk)
            {
                HorizontalTextChunk horAs = (HorizontalTextChunk) as;
                return getLineNumber() == horAs.getLineNumber();
            }
            else
                return super.sameLine(as);
        }

        public int getLineNumber()
        {
            Vector startLocation = getStartLocation();
            float y = startLocation.get(Vector.I2);
            List<Float> flips = textLineFinder.verticalFlips;
            if (flips == null || flips.isEmpty())
                return 0;
            if (y < flips.get(0))
                return flips.size() / 2 + 1;
            for (int i = 1; i < flips.size(); i+=2)
            {
                if (y < flips.get(i))
                {
                    return (1 + flips.size() - i) / 2;
                }
            }
            return 0;
        }
    }

    @Override
    public void renderText(TextRenderInfo renderInfo)
    {
        textLineFinder.renderText(renderInfo);

        LineSegment segment = renderInfo.getBaseline();
        if (renderInfo.getRise() != 0){ // remove the rise from the baseline - we do this because the text from a super/subscript render operations should probably be considered as part of the baseline of the text the super/sub is relative to 
            Matrix riseOffsetTransform = new Matrix(0, -renderInfo.getRise());
            segment = segment.transformBy(riseOffsetTransform);
        }
        TextChunk location = new HorizontalTextChunk(renderInfo.getText(), segment.getStartPoint(), segment.getEndPoint(), renderInfo.getSingleSpaceWidth());
        getLocationalResult().add(location);        
    }

    public HorizontalTextExtractionStrategy() throws NoSuchFieldException, SecurityException
    {
        locationalResultField = LocationTextExtractionStrategy.class.getDeclaredField("locationalResult");
        locationalResultField.setAccessible(true);

        textLineFinder = new TextLineFinder();
    }

    @SuppressWarnings("unchecked")
    List<TextChunk> getLocationalResult()
    {
        try
        {
            return (List<TextChunk>) locationalResultField.get(this);
        }
        catch (IllegalArgumentException | IllegalAccessException e)
        {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }

    final Field locationalResultField;
    final TextLineFinder textLineFinder;
}

String extract(PdfReader reader, int pageNo) throws IOException, NoSuchFieldException, SecurityException
{
    return PdfTextExtractor.getTextFromPage(reader, pageNo, new HorizontalTextExtractionStrategy());
}

monoxide (CO) in flue gas in accordance with the following formula:   C.E. = [CO 2/(CO + CO 2 )]

（来自）

OP文件第11页“燃烧效率”下的示例文本如下所示：

public class TextLineFinder implements RenderListener
{
    @Override
    public void beginTextBlock() { }
    @Override
    public void endTextBlock() { }
    @Override
    public void renderImage(ImageRenderInfo renderInfo) { }

    /*
     * @see RenderListener#renderText(TextRenderInfo)
     */
    @Override
    public void renderText(TextRenderInfo renderInfo)
    {
        LineSegment ascentLine = renderInfo.getAscentLine();
        LineSegment descentLine = renderInfo.getDescentLine();
        float[] yCoords = new float[]{
                ascentLine.getStartPoint().get(Vector.I2),
                ascentLine.getEndPoint().get(Vector.I2),
                descentLine.getStartPoint().get(Vector.I2),
                descentLine.getEndPoint().get(Vector.I2)
        };
        Arrays.sort(yCoords);
        addVerticalUseSection(yCoords[0], yCoords[3]);
    }

    /**
     * This method marks the given interval as used.
     */
    void addVerticalUseSection(float from, float to)
    {
        if (to < from)
        {
            float temp = to;
            to = from;
            from = temp;
        }

        int i=0, j=0;
        for (; i<verticalFlips.size(); i++)
        {
            float flip = verticalFlips.get(i);
            if (flip < from)
                continue;

            for (j=i; j<verticalFlips.size(); j++)
            {
                flip = verticalFlips.get(j);
                if (flip < to)
                    continue;
                break;
            }
            break;
        }
        boolean fromOutsideInterval = i%2==0;
        boolean toOutsideInterval = j%2==0;

        while (j-- > i)
            verticalFlips.remove(j);
        if (toOutsideInterval)
            verticalFlips.add(i, to);
        if (fromOutsideInterval)
            verticalFlips.add(i, from);
    }

    final List<Float> verticalFlips = new ArrayList<Float>();
}

public class HorizontalTextExtractionStrategy extends LocationTextExtractionStrategy
{
    public class HorizontalTextChunk extends TextChunk
    {
        public HorizontalTextChunk(String string, Vector startLocation, Vector endLocation, float charSpaceWidth)
        {
            super(string, startLocation, endLocation, charSpaceWidth);
        }

        @Override
        public int compareTo(TextChunk rhs)
        {
            if (rhs instanceof HorizontalTextChunk)
            {
                HorizontalTextChunk horRhs = (HorizontalTextChunk) rhs;
                int rslt = Integer.compare(getLineNumber(), horRhs.getLineNumber());
                if (rslt != 0) return rslt;
                return Float.compare(getStartLocation().get(Vector.I1), rhs.getStartLocation().get(Vector.I1));
            }
            else
                return super.compareTo(rhs);
        }

        @Override
        public boolean sameLine(TextChunk as)
        {
            if (as instanceof HorizontalTextChunk)
            {
                HorizontalTextChunk horAs = (HorizontalTextChunk) as;
                return getLineNumber() == horAs.getLineNumber();
            }
            else
                return super.sameLine(as);
        }

        public int getLineNumber()
        {
            Vector startLocation = getStartLocation();
            float y = startLocation.get(Vector.I2);
            List<Float> flips = textLineFinder.verticalFlips;
            if (flips == null || flips.isEmpty())
                return 0;
            if (y < flips.get(0))
                return flips.size() / 2 + 1;
            for (int i = 1; i < flips.size(); i+=2)
            {
                if (y < flips.get(i))
                {
                    return (1 + flips.size() - i) / 2;
                }
            }
            return 0;
        }
    }

    @Override
    public void renderText(TextRenderInfo renderInfo)
    {
        textLineFinder.renderText(renderInfo);

        LineSegment segment = renderInfo.getBaseline();
        if (renderInfo.getRise() != 0){ // remove the rise from the baseline - we do this because the text from a super/subscript render operations should probably be considered as part of the baseline of the text the super/sub is relative to 
            Matrix riseOffsetTransform = new Matrix(0, -renderInfo.getRise());
            segment = segment.transformBy(riseOffsetTransform);
        }
        TextChunk location = new HorizontalTextChunk(renderInfo.getText(), segment.getStartPoint(), segment.getEndPoint(), renderInfo.getSingleSpaceWidth());
        getLocationalResult().add(location);        
    }

    public HorizontalTextExtractionStrategy() throws NoSuchFieldException, SecurityException
    {
        locationalResultField = LocationTextExtractionStrategy.class.getDeclaredField("locationalResult");
        locationalResultField.setAccessible(true);

        textLineFinder = new TextLineFinder();
    }

    @SuppressWarnings("unchecked")
    List<TextChunk> getLocationalResult()
    {
        try
        {
            return (List<TextChunk>) locationalResultField.get(this);
        }
        catch (IllegalArgumentException | IllegalAccessException e)
        {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }

    final Field locationalResultField;
    final TextLineFinder textLineFinder;
}

String extract(PdfReader reader, int pageNo) throws IOException, NoSuchFieldException, SecurityException
{
    return PdfTextExtractor.getTextFromPage(reader, pageNo, new HorizontalTextExtractionStrategy());
}

monoxide (CO) in flue gas in accordance with the following formula:   C.E. = [CO 2/(CO + CO 2 )]

使用C#&iTextSharp的相同方法以Java为中心的部分中的解释、警告和示例结果仍然适用，下面是代码：

我正在使用iTextSharp 5.5.7

一种识别线条的方法更新：位置文本提取策略的更改在iText 5.5.9中，快照提交53526e4854fcb80c86cbc2e113f7a07401dc9a67（“重构位置TextExtractionStrategy…”）到1AB350BEE148BE2A4 BEF5E663B3D67A004FF9F8（“使TextChunkLocation成为可比较的类…”）

LocationTextExtractionStrategy

体系结构已更改，以允许这样的定制，而无需进行反射

不幸的是，这一变化打破了上述水平文本提取策略。对于提交后的iText版本，可以使用以下策略：

public class HorizontalTextExtractionStrategy2 extends LocationTextExtractionStrategy
{
    public static class HorizontalTextChunkLocationStrategy implements TextChunkLocationStrategy
    {
        public HorizontalTextChunkLocationStrategy(TextLineFinder textLineFinder)
        {
            this.textLineFinder = textLineFinder;
        }

        @Override
        public TextChunkLocation createLocation(TextRenderInfo renderInfo, LineSegment baseline)
        {
            return new HorizontalTextChunkLocation(baseline.getStartPoint(), baseline.getEndPoint(), renderInfo.getSingleSpaceWidth());
        }

        final TextLineFinder textLineFinder;

        public class HorizontalTextChunkLocation implements TextChunkLocation
        {
            /** the starting location of the chunk */
            private final Vector startLocation;
            /** the ending location of the chunk */
            private final Vector endLocation;
            /** unit vector in the orientation of the chunk */
            private final Vector orientationVector;
            /** the orientation as a scalar for quick sorting */
            private final int orientationMagnitude;
            /** perpendicular distance to the orientation unit vector (i.e. the Y position in an unrotated coordinate system)
             * we round to the nearest integer to handle the fuzziness of comparing floats */
            private final int distPerpendicular;
            /** distance of the start of the chunk parallel to the orientation unit vector (i.e. the X position in an unrotated coordinate system) */
            private final float distParallelStart;
            /** distance of the end of the chunk parallel to the orientation unit vector (i.e. the X position in an unrotated coordinate system) */
            private final float distParallelEnd;
            /** the width of a single space character in the font of the chunk */
            private final float charSpaceWidth;

            public HorizontalTextChunkLocation(Vector startLocation, Vector endLocation, float charSpaceWidth)
            {
                this.startLocation = startLocation;
                this.endLocation = endLocation;
                this.charSpaceWidth = charSpaceWidth;

                Vector oVector = endLocation.subtract(startLocation);
                if (oVector.length() == 0)
                {
                    oVector = new Vector(1, 0, 0);
                }
                orientationVector = oVector.normalize();
                orientationMagnitude = (int)(Math.atan2(orientationVector.get(Vector.I2), orientationVector.get(Vector.I1))*1000);

                // see http://mathworld.wolfram.com/Point-LineDistance2-Dimensional.html
                // the two vectors we are crossing are in the same plane, so the result will be purely
                // in the z-axis (out of plane) direction, so we just take the I3 component of the result
                Vector origin = new Vector(0,0,1);
                distPerpendicular = (int)(startLocation.subtract(origin)).cross(orientationVector).get(Vector.I3);

                distParallelStart = orientationVector.dot(startLocation);
                distParallelEnd = orientationVector.dot(endLocation);
            }

            public int orientationMagnitude()   {   return orientationMagnitude;    }
            public int distPerpendicular()      {   return distPerpendicular;       }
            public float distParallelStart()    {   return distParallelStart;       }
            public float distParallelEnd()      {   return distParallelEnd;         }
            public Vector getStartLocation()    {   return startLocation;           }
            public Vector getEndLocation()      {   return endLocation;             }
            public float getCharSpaceWidth()    {   return charSpaceWidth;          }

            /**
             * @param as the location to compare to
             * @return true is this location is on the the same line as the other
             */
            public boolean sameLine(TextChunkLocation as)
            {
                if (as instanceof HorizontalTextChunkLocation)
                {
                    HorizontalTextChunkLocation horAs = (HorizontalTextChunkLocation) as;
                    return getLineNumber() == horAs.getLineNumber();
                }
                else
                    return orientationMagnitude() == as.orientationMagnitude() && distPerpendicular() == as.distPerpendicular();
            }

            /**
             * Computes the distance between the end of 'other' and the beginning of this chunk
             * in the direction of this chunk's orientation vector.  Note that it's a bad idea
             * to call this for chunks that aren't on the same line and orientation, but we don't
             * explicitly check for that condition for performance reasons.
             * @param other
             * @return the number of spaces between the end of 'other' and the beginning of this chunk
             */
            public float distanceFromEndOf(TextChunkLocation other)
            {
                float distance = distParallelStart() - other.distParallelEnd();
                return distance;
            }

            public boolean isAtWordBoundary(TextChunkLocation previous)
            {
                /**
                 * Here we handle a very specific case which in PDF may look like:
                 * -.232 Tc [( P)-226.2(r)-231.8(e)-230.8(f)-238(a)-238.9(c)-228.9(e)]TJ
                 * The font's charSpace width is 0.232 and it's compensated with charSpacing of 0.232.
                 * And a resultant TextChunk.charSpaceWidth comes to TextChunk constructor as 0.
                 * In this case every chunk is considered as a word boundary and space is added.
                 * We should consider charSpaceWidth equal (or close) to zero as a no-space.
                 */
                if (getCharSpaceWidth() < 0.1f)
                    return false;

                float dist = distanceFromEndOf(previous);

                return dist < -getCharSpaceWidth() || dist > getCharSpaceWidth()/2.0f;
            }

            public int getLineNumber()
            {
                Vector startLocation = getStartLocation();
                float y = startLocation.get(Vector.I2);
                List<Float> flips = textLineFinder.verticalFlips;
                if (flips == null || flips.isEmpty())
                    return 0;
                if (y < flips.get(0))
                    return flips.size() / 2 + 1;
                for (int i = 1; i < flips.size(); i+=2)
                {
                    if (y < flips.get(i))
                    {
                        return (1 + flips.size() - i) / 2;
                    }
                }
                return 0;
            }

            @Override
            public int compareTo(TextChunkLocation rhs)
            {
                if (rhs instanceof HorizontalTextChunkLocation)
                {
                    HorizontalTextChunkLocation horRhs = (HorizontalTextChunkLocation) rhs;
                    int rslt = Integer.compare(getLineNumber(), horRhs.getLineNumber());
                    if (rslt != 0) return rslt;
                    return Float.compare(getStartLocation().get(Vector.I1), rhs.getStartLocation().get(Vector.I1));
                }
                else
                {
                    int rslt;
                    rslt = Integer.compare(orientationMagnitude(), rhs.orientationMagnitude());
                    if (rslt != 0) return rslt;

                    rslt = Integer.compare(distPerpendicular(), rhs.distPerpendicular());
                    if (rslt != 0) return rslt;

                    return Float.compare(distParallelStart(), rhs.distParallelStart());
                }
            }
        }
    }

    @Override
    public void renderText(TextRenderInfo renderInfo)
    {
        textLineFinder.renderText(renderInfo);
        super.renderText(renderInfo);
    }

    public HorizontalTextExtractionStrategy2() throws NoSuchFieldException, SecurityException
    {
        this(new TextLineFinder());
    }

    public HorizontalTextExtractionStrategy2(TextLineFinder textLineFinder) throws NoSuchFieldException, SecurityException
    {
        super(new HorizontalTextChunkLocationStrategy(textLineFinder));

        this.textLineFinder = textLineFinder;
    }

    final TextLineFinder textLineFinder;
}

公共类HorizontalTextExtractionStrategy2扩展了LocationTextExtractionStrategy
{
公共静态类HorizontalTextChunkLocationStrategy实现TextChunkLocationStrategy
{
公共水平TextChunkLocationStrategy（TextLineFinder TextLineFinder）
{
this.textLineFinder=textLineFinder；
}
@凌驾
公共TextChunkLocation createLocation（TextRenderInfo renderInfo，线段基线）
{
返回新的HorizontalTextChunkLocation（baseline.getStartPoint（）、baseline.getEndPoint（）、renderInfo.getSingleSpaceWidth（））；
}
最终TextLineFinder TextLineFinder；
公共类HorizontalTextChunkLocation实现TextChunkLocation
{
/**块的起始位置*/
私人最终载体定位；
/**块的结束位置*/
私有最终向量末端定位；
/**块方向上的单位向量*/
私有最终向量定向向量；
/**用于快速排序的标量方向*/
私人最终国际定位的重要性；
/**与方向单位向量的垂直距离（即未旋转坐标系中的Y位置）
*我们四舍五入到最接近的整数来处理比较浮点数的模糊性*/
私人终审法院；
/**平行于方向单位向量的块起点距离（即未旋转坐标系中的X位置）*/
私人最终启动；
/**平行于方向单位向量的块端距离（即未旋转坐标系中的X位置）*/
私人最终浮点数；
/**块字体中单个空格字符的宽度*/
私有最终浮动字符空间宽度；
公共水平TextChunkLocation（向量起始位置、向量结束位置、浮点字符空格宽度）
{
this.startolocation=startolocation；
this.endLocation=endLocation；
this.charSpaceWidth=charSpaceWidth；
向量向量向量=结束位置。减去（起始位置）；
如果（oVector.length（）==0）
{
oVector=新向量（1,0,0）；
}
方向向量=oVector.normalize（）；
方向大小=（int）(