Javascript 使用node.js和pdf.js将pdf转换为html文件_Javascript_Html_Node.js_Pdf.js

Javascript 使用node.js和pdf.js将pdf转换为html文件

javascript html node.js

Javascript 使用node.js和pdf.js将pdf转换为html文件,javascript,html,node.js,pdf.js,Javascript,Html,Node.js,Pdf.js,我想使用pdf.js将pdf转换为html页面。js可以在浏览器中实现这一点，但是否可以在后端获得浏览器呈现的html页面，从而将n个页面的Pdf转换为n个html文件。我使用node.js作为后端。我试过pdf2html和其他类似的npm模块，它们工作不太好，并且在一些PDF中存在问题。谢谢你的建议。也许我发现了类似的东西-我正在使用本地PDF文件和浏览器。我在ready-madeviewer.js/PDF.js中做了一些小改动，应该可以同时使用Node.js和browser进行处理此脚本包

我想使用pdf.js将pdf转换为html页面。js可以在浏览器中实现这一点，但是否可以在后端获得浏览器呈现的html页面，从而将n个页面的Pdf转换为n个html文件。我使用node.js作为后端。我试过pdf2html和其他类似的npm模块，它们工作不太好，并且在一些PDF中存在问题。谢谢你的建议。

也许我发现了类似的东西-我正在使用本地PDF文件和浏览器。我在ready-madeviewer.js/PDF.js中做了一些小改动，应该可以同时使用Node.js和browser进行处理

此脚本包括由参数指定的PDF文件到viewer.js Webpack并启动浏览器

const fs = require('fs');
const path = require('path');
const pdf = require('process').argv[2];
const chp = require('child_process');
const datauri = require(path.join(process.env.APPDATA, 'npm/node_modules', 'datauri'));
datauri(pdf, (err, content, meta) => {
    if (err) {
        throw err;
    }
    const viewerJSpath = path.join(__dirname, './viewer.js');
    let wp = fs.readFileSync(viewerJSpath, 'utf-8');
    const pdfName = 'compressed.tracemonkey-pldi-09.pdf';
    const srcPos = [wp.indexOf(pdfName)];
    srcPos.push(srcPos[0] + pdfName.length);
    let HOSTED_VIEWER_ORIGINS = wp.indexOf('HOSTED_VIEWER_ORIGINS');
    HOSTED_VIEWER_ORIGINS = wp.indexOf(']', HOSTED_VIEWER_ORIGINS);
    wp = wp.substr(0, srcPos[0]) + content +
    wp.substr(srcPos[1], HOSTED_VIEWER_ORIGINS - srcPos[1]) + ', "file://"' +
    wp.substr(HOSTED_VIEWER_ORIGINS);
    fs.writeFileSync(viewerJSpath, wp, 'utf-8');
    const c = path.join(__dirname, 'viewer.html');
    chp.execSync(c);
});

然后尝试将原始宽度作为下一个样式参数添加到renderTextLayer的appendText方法中，并将元素按位置排序添加到TextLayerBuilder的渲染方法next2

this.textLayerDiv.appendChild（textLayerFrag）
似乎只需要web和build文件夹（npmi-gdatauri示例除外）

使用Puppeter和稍加修改的PDF.js，可以直接转换（适用于头部/下部，但元素大小略有不同）
木偶演员/铬需要的修复：
const message = exception?.message; // => exception.message
page: this.pageLabel ?? this.id // => this.pageLabel || this.id

viewers.js=>viewerSrc.js基本添加：
function webViewerPageRendered({
...
  if (pageNumber < PDFViewerApplication.pagesCount) {
    arguments[0].source.eventBus.dispatch("pagenumberchanged", {
      value: pageNumber + 1
    }); // generate all remaining pages
  }
}

class BaseViewer {
  constructor(options) {
    this.pageNo = []; // rendered pages array
...
  _setCurrentPageNumber(val, resetCurrentPageView = false) {
...
    if (this.pageNo.indexOf(val) < 0) {
      this.pageNo.push(val);
    }
    if (this.pagesCount - 1 <= this.pageNo.length) {
      window.reader(elLists); // sent result back 2 node.js
    }

函数WebViewerPagerEnded({
...
if（页码如果（this.pageScont-1您的解决方案在这里->这不是免费的！：（pdf.js将pdf转换为图像（画布、png等）。它不会将pdf转换为HTML。此处的在线演示-pdf下载更改为HTM下载（文本层）
function webViewerPageRendered({
...
  if (pageNumber < PDFViewerApplication.pagesCount) {
    arguments[0].source.eventBus.dispatch("pagenumberchanged", {
      value: pageNumber + 1
    }); // generate all remaining pages
  }
}

class BaseViewer {
  constructor(options) {
    this.pageNo = []; // rendered pages array
...
  _setCurrentPageNumber(val, resetCurrentPageView = false) {
...
    if (this.pageNo.indexOf(val) < 0) {
      this.pageNo.push(val);
    }
    if (this.pagesCount - 1 <= this.pageNo.length) {
      window.reader(elLists); // sent result back 2 node.js
    }

{
    "1": {
        "0": {
            "x": 99.9871,
            "y": 98.0496,
            "w": 557.695,
            "h": 22,
            "text": "Trace-based Just-in-Time Type Specialization for Dynamic",
            "ff": "sans-serif",
            "fs": "22.2695px",
            "cssText": "left: 99.9871px; top: 98.0496px; width: 557.695px; font-size: 22.2695px; font-family: sans-serif; transform: scaleX(0.970163);"
        },
        "1": {
            "x": 327.478,
            "y": 122.793,
            "w": 102.707,
            "h": 22,
            "text": "Languages",
            "ff": "sans-serif",
            "fs": "22.2695px",
            "cssText": "left: 327.478px; top: 122.793px; width: 102.707px; font-size: 22.2695px; font-family: sans-serif; transform: scaleX(0.932262);"
        },
...
    "2": {
        "0": {
            "x": 393.677,
            "y": 90.3408,
            "w": 192.909,
            "h": 11,
            "text": "1 for (var i = 2; i < 100; ++i) {",
            "ff": "monospace",
            "fs": "11.1347px",
            "cssText": "left: 393.677px; top: 90.3408px; width: 192.909px; font-size: 11.1347px; font-family: monospace; transform: scaleX(0.875232);"
        },
        "1": {
            "x": 67.0588,
            "y": 91.7599,
            "w": 173.346,
            "h": 11,
            "text": "Hence, recording and compiling a trace",
            "ff": "sans-serif",
            "fs": "11.1347px",
            "cssText": "left: 67.0588px; top: 91.7599px; width: 173.346px; font-size: 11.1347px; font-family: sans-serif; transform: scaleX(0.895175);"
        },