如何在c#net中使用pdfbox获取pdf中的特定位置词_C#_.net_Pdfbox

如何在c#net中使用pdfbox获取pdf中的特定位置词

c# .net

如何在c#net中使用pdfbox获取pdf中的特定位置词,c#,.net,pdfbox,C#,.net,Pdfbox,我在pdfbox中遇到了一些问题。我无法在输入的位置获取特定的单词（例如，x=20，y=30，高度=100，宽度=100）我是如何从特定区域获得单词的。最后我得到了答案。工作正常 public static void ReadCooridinates(string sourceFilePath) { using (PDDocument pDDocument = PDDocument.load(sourceFilePath)) { PDPage page = new PDPag

我在pdfbox中遇到了一些问题。我无法在输入的位置获取特定的单词（例如，

x=20，y=30，高度=100，宽度=100

）

我是如何从特定区域获得单词的。

最后我得到了答案。工作正常

public static void ReadCooridinates(string sourceFilePath)
{
  using (PDDocument pDDocument = PDDocument.load(sourceFilePath))
  {
    PDPage page = new PDPage();
    java.util.List allPages = pDDocument.getDocumentCatalog().getAllPages();
    page = (PDPage)allPages.get(0);
    PDFTextStripperByArea stripper = new PDFTextStripperByArea();
    stripper.setSortByPosition(true);
    Rectangle rect = new Rectangle(10, 200, 100, 30);
    stripper.addRegion("class1", rect);
    stripper.extractRegions(page);//Assign the page to read the coordinates
    Console.WriteLine("\nText in the area:" + rect + "\n");
    Console.WriteLine(stripper.getTextForRegion("class1"));
  }
}

请尝试

PDFTextStripperByArea

。谢谢您的评论。如何使用PDFTextStripperByArea，因为我已经试过了。PDDocument PDDocument=PDDocument.load（文件路径）PDFTextStripper PDFTextStripper=new PDFTextStripper（）；PDFTextStripperByArea stripperArea=新的PDFTextStripperByArea（）；剥离区域设置端口位置（真）；矩形rect=新矩形（1028027560）；条带区域添加区域（“class1”，矩形）；区域剥离器使用java坐标，而不是PDF坐标，因此

y=page.getMediaBox（）.getHeight（）-y，另请参见打印URL
示例。感谢您的评论。我只想使用c#从pdfbox中的坐标中获取特定单词。不支持netC#。。。更好的解释是：x=30意味着您将从页面顶部获得一些内容。高度100约为一页的1/8。如果你什么也没得到，试试x=0，y=0，高度=800，宽度=600，然后从那里开始减少。