Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按坐标提取PDF页面的区域_Python_Pdf_Command Line_Extract_Crop - Fatal编程技术网

Python 按坐标提取PDF页面的区域

Python 按坐标提取PDF页面的区域,python,pdf,command-line,extract,crop,Python,Pdf,Command Line,Extract,Crop,我正在寻找一个工具来提取一个1页PDF文件的给定矩形区域(通过坐标),并生成一个具有指定区域的1页PDF文件: # in.pdf is a 1-page pdf file extract file.pdf 0 0 100 100 > out.pdf # out.pdf is now a 1-page pdf file with a page of size 100x100 # it contains the region (0, 0) to (100, 100) of file.pdf

我正在寻找一个工具来提取一个1页PDF文件的给定矩形区域(通过坐标),并生成一个具有指定区域的1页PDF文件:

# in.pdf is a 1-page pdf file
extract file.pdf 0 0 100 100 > out.pdf
# out.pdf is now a 1-page pdf file with a page of size 100x100
# it contains the region (0, 0) to (100, 100) of file.pdf
我可以将PDF转换为图像并使用
convert
,但这意味着生成的PDF将不再是矢量的,这是不可接受的(我希望能够缩放)

理想情况下,我希望使用命令行工具或Python库执行此任务


谢谢

在中找到以下脚本 将pdf的每一页拆分为2页

#!/usr/bin/env perl
use strict; use warnings;
use PDF::API2;

my $filename = shift;
my $oldpdf = PDF::API2->open($filename);
my $newpdf = PDF::API2->new;

for my $page_nb (1..$oldpdf->pages) {
  my ($page, @cropdata);

  $page = $newpdf->importpage($oldpdf, $page_nb);
  @cropdata = $page->get_mediabox;
  $cropdata[2] /= 2;
  $page->cropbox(@cropdata);
  $page->trimbox(@cropdata);
  $page->mediabox(@cropdata);

  $page = $newpdf->importpage($oldpdf, $page_nb);
  @cropdata = $page->get_mediabox;
  $cropdata[0] = $cropdata[2] / 2;
  $page->cropbox(@cropdata);
  $page->trimbox(@cropdata);
  $page->mediabox(@cropdata);
}

(my $newfilename = $filename) =~ s/(.*)\.(\w+)$/$1.clean.$2/;
$newpdf->saveas('destination_path/myfile.pdf');
使用,您可以执行以下操作:

import sys
import pyPdf

def extract(in_file, coords, out_file):
    with open(in_file, 'rb') as infp:
        reader = pyPdf.PdfFileReader(infp)
        page = reader.getPage(0)
        writer = pyPdf.PdfFileWriter()
        page.mediaBox.lowerLeft = coords[:2]
        page.mediaBox.upperRight = coords[2:]
        # you could do the same for page.trimBox and page.cropBox
        writer.addPage(page)
        with open(out_file, 'wb') as outfp:
            writer.write(outfp)

if __name__ == '__main__':
    in_file = sys.argv[1]
    coords = [int(i) for i in sys.argv[2:6]]
    out_file = sys.argv[6]

    extract(in_file, coords, out_file)