Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: images not rendered to buffer in Node.js #19102

Closed
fenicento opened this issue Nov 25, 2024 · 1 comment
Closed

[Bug]: images not rendered to buffer in Node.js #19102

fenicento opened this issue Nov 25, 2024 · 1 comment

Comments

@fenicento
Copy link

Attach (recommended) or Link to PDF file

image-based-pdf-sample.pdf

Web browser and its version

Node.js (electron 31.7.3)

Operating system and its version

Windows 11 x64, MacOs arm64

PDF.js version

4.9.0

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

On nodejs, try to save as PNGs the pages of a pdf like the one provided

What is the expected behavior?

The saved PNGs should contain all the elements in the related page. Instead, images are missing. If the page is a single big image, the PNG is blank.

What went wrong?

Saved PNGs do not contain images. No errors or warning are thrown.

Link to a viewer

No response

Additional context

here is my code:

/**
 * Converts a PDF file to a series of images.
 * @param {Object} file - The file object containing input path and name
 * @param {string} destinationFolder - The folder where the converted images will be saved
 * @param {string} [outputFormat='avif'] - The desired output format (e.g., 'avif', 'webp')
 * @param {number} [resizeWidth] - Optional width to resize the images (only if original is larger)
 * @param {boolean} shouldRetainSubdirs - Whether to retain the subdirectory structure
 * @param {Window} mainWindow - The main window object for progress updates
 * @param {AbortSignal} signal - Signal for aborting the conversion
 * @returns {Promise<Object>} - A promise that resolves to an object with the output path
 */
export const convertPdfToImages = async (file, destinationFolder, outputFormat = 'avif', resizeWidth, shouldRetainSubdirs, mainWindow, signal) => {
    // Create output directory if it doesn't exist
    if (!fs.existsSync(destinationFolder)) {
        fs.mkdirSync(destinationFolder, { recursive: true });
    }

    const pdfFilename = `${path.parse(file.name).name}`;
    const outputDir = path.join(destinationFolder, shouldRetainSubdirs ? file.relativePath : '', pdfFilename)
    
    if (!fs.existsSync(outputDir)) {
        fs.mkdirSync(outputDir, { recursive: true });
    }

    const createdFiles = []; // Track files created during conversion

    try {
        // Load the PDF document
        const data = new Uint8Array(fs.readFileSync(file.inputPath));
        const loadingTask = pdfjsLib.getDocument(
            { 
                data, 
                useSystemFonts: true, 
                maxImageSize: -1, 
                enableXfa: true,
                background: 'white'
            });
        const pdfDocument = await loadingTask.promise;
        
        const numPages = pdfDocument.numPages;
        console.log(`PDF loaded. Number of pages: ${numPages}`);
        let progress = 0;

        // Process each page
        for (let pageNumber = 1; pageNumber <= numPages; pageNumber++) {
            if (signal?.aborted) {
                throw new Error('Conversion stopped by user');
            }

            const page = await pdfDocument.getPage(pageNumber);
            const annotations = await page.getAnnotations();
            console.log("annotations", annotations)
            const viewport = page.getViewport({ scale: 3.0 }); // Higher scale for better quality

            // Create canvas and context
            const canvasFactory = pdfDocument.canvasFactory;
            const canvas = await canvasFactory.create(viewport.width, viewport.height);

            // Render PDF page to canvas
            const renderContext = {
                canvasContext: canvas.context,
                viewport: viewport,

            };

            const renderTask = page.render(renderContext, );
            await renderTask.promise;            

            // Convert canvas to buffer
            let imageBuffer = await canvas.canvas.encode("png");
            
            // Process with Sharp for resizing and format conversion
            let sharpInstance = sharp(imageBuffer);
            
            if (resizeWidth) {
                sharpInstance = sharpInstance.resize(resizeWidth, null, {
                    withoutEnlargement: true
                });
            }

            const outputPath = path.join(outputDir, `page-${String(pageNumber).padStart(3, '0')}.${outputFormat}`);
            createdFiles.push(outputPath); // Track this file
            
            // Convert and save with abort handling
            let pipeline = sharpInstance.toFormat(outputFormat);
            const abortPromise = new Promise((_, reject) => {
                signal?.addEventListener('abort', () => {

                    pipeline.destroy();                   
                    pipeline = null;
                    reject(new Error('Conversion stopped by user'));
                }, { once: true });
            });
            
            await Promise.race([
                pipeline.toFile(outputPath),
                abortPromise
            ]);

            page.cleanup();
            progress = (pageNumber / numPages * 100).toFixed(1);
            mainWindow.webContents.send('onProgress', {
                type: 'progress',
                data: {
                    file: file,
                    progress: parseFloat(progress)
                }
            });
        }

        console.log('Conversion completed successfully!');
        return {
            outputPath: destinationFolder
        };

    } catch (error) {
        // Give sharp a chance to finish up before cleaning up
        setTimeout(() => {
            cleanupConversion(outputDir, createdFiles);
        },100);
        console.error('Error during conversion:', error);
        throw error;
    }
}

and my package.json:

{
  "name": "db-image-optimizer-electron",
  "version": "1.0.0",
  "description": "An Electron application with Svelte",
  "main": "./out/main/index.js",
  "type": "module",
  "author": "example.com",
  "homepage": "https://electron-vite.org",
  "scripts": {
    "format": "prettier --plugin prettier-plugin-svelte --write .",
    "lint": "eslint . --ext .js,.jsx,.cjs,.mjs,.ts,.tsx,.cts,.mts --fix",
    "start": "electron-vite preview",
    "dev": "electron-vite dev",
    "build": "electron-vite build",
    "postinstall": "electron-builder install-app-deps",
    "build:unpack": "npm run build && electron-builder --dir",
    "build:win": "npm run build && electron-builder --win",
    "build:mac": "npm run build && electron-builder --mac",
    "build:linux": "npm run build && electron-builder --linux"
  },
  "dependencies": {
    "@electric-sql/pglite": "^0.2.13",
    "@electron-toolkit/preload": "^3.0.1",
    "@electron-toolkit/utils": "^3.0.0",
    "bootstrap": "^5.3.3",
    "bootstrap-icons": "^1.11.3",
    "ffmpeg-static": "^5.2.0",
    "knex-pglite": "^0.9.7",
    "lodash": "^4.17.21",
    "pdf.js": "github:mozilla/pdf.js",
    "sass": "^1.80.4",
    "sharp": "^0.33.5",
    "svelte-spa-router": "^4.0.1",
    "vite-plugin-node-polyfills": "^0.22.0"
  },
  "devDependencies": {
    "@electron-toolkit/eslint-config": "^1.0.2",
    "@electron-toolkit/eslint-config-prettier": "^2.0.0",
    "@electron/rebuild": "^3.7.0",
    "@sveltejs/vite-plugin-svelte": "^3.1.1",
    "electron": "^31.7.3",
    "electron-builder": "^24.13.3",
    "electron-vite": "^2.3.0",
    "eslint": "^8.57.0",
    "eslint-plugin-svelte": "^2.41.0",
    "prettier": "^3.3.2",
    "prettier-plugin-svelte": "^3.2.5",
    "svelte": "^5.1.2",
    "vite": "^5.3.1"
  }
}

could this be related to the recent canvas library change?

@Snuffleupagus
Copy link
Collaborator

Web browser and its version

Node.js (electron 31.7.3)

Please note that we've never actually supported Electron, or any other framework.

PDF.js version

4.9.0

Sorry, but that exact version number has never existed.


WFM, when testing your PDF against the master branch of the library together with https://github.com/mozilla/pdf.js/tree/master/examples/node/pdf2png locally on Windows (i.e. running it "directly" in Node.js without involving Electron).

The following PNG was generated, using the default scale of 1:

output

@Snuffleupagus Snuffleupagus closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants