Skip to content

Latest commit

 

History

History
192 lines (144 loc) · 7.08 KB

README.md

File metadata and controls

192 lines (144 loc) · 7.08 KB

Recursive PDF Downloader

Table of Contents

Intro

This is an app for the Nextcloud cloud software. It adds a new menu entry to the actions menu of each folder, archive, or individual file in the files view which lets you download, respectively, entire directories trees, all files in archives, or other individual files, converted and assembled as a single PDF file. Additionally, it adds a tab to the details view where version actions can be performed.

For the PDF generation, the following steps are performed:

  • walk through the given folder
  • convert all found files to PDF
    • optionally transparently traverse archive files (zip etc.)
    • handle some special cases
    • try to convert the remaining files with unoconv or an admin-provided fallback-script
    • generate a PDF placeholder error page for each failed conversion
  • then combine all found or generated PDF files in one document using pdftk
  • add bookmarks to mark the start of each folder and each file
    • existing bookmarks are "shifted down" accordingly
    • the resulting bookmark structure resembles the folder structure
  • optionally place a "Folder PAGE/MAX_PAGES" label at the top of each page
  • finally, present the generated PDF as a download or save it to the cloud file system.

The app offers the choice between online and background PDF generation. "Background" means that a job is scheduled, and then runs independently of the web browser frontend. The user will be notified after the job has been completed.

Compatibility

The app currently requires PHP >= 8.0. It should be usable with Nextcloud v23 and probably also with v24.

Working Conversions

Builtin Converters

  • PDF files ;) -- of course, just pass-through
  • office files via LibreOffice
  • EML (RFC822) files, i.e. emails you saved to disk, via mhonarc, wkthmltopdf
  • HTML files via wkhtmltopdf
  • TIFF files via tiff2pdf
  • Postscript files via ps2pdf
  • everything else is passed to unoconv
  • if unoconv fails, a PDF placeholder error page is generated

Custom Converters

Administrators may specify a shell script or program for

  • default conversion: try this script before any other converters, if it fails continue with the builtin converters

  • fallback conversion: if all other converters fail, try the given script as a fallback, if that fails also generate an error page.

    If no fallback converter is configured then unoconv is used as the fallback.

On-the-fly Extraction of Archive Files

If enabled by an admin users can choose to enable on-the-fly extraction of archive files.

Security

  • To somehow reduce the danger of zip bombs, there is a hard-coded upper limit of the decompressed archive size
  • administrators can lower this limit to reduce resource usage on the server, or if they feel that the built-in limit of 2^30 bytes is too high.
  • users may decrease this limit further on a per-user basis
  • administrators may be disabled by administrators altogether
  • if enable users may decide by themselves whether to enable this feature or not

Implementation

This package relies on wapmorgan/unified-archive as the archive handling backend. Please see over there for a list of supported archive formats and how to support further archive formats.

User Preferences

Page Label and File-Name Templates

The app allows configuring page labels and automatically generated download and destination file names based on a user-configured template. The details can be found in Braced Text Templates.

Overlay Font Selection

  • the fonts can be customized from the list of fonts shopped with tcpdf
  • the backend generates font samples for the chosen fonts and also provides a preview of the configured page labels with the chosen font.

Include and Exclude Patterns

Files can be included or excluded by regular expressions and a setting controls whether one or the other regular expression has precedence in case both patterns match. Unfortunately, those patterns cannot (yet) be controlled from the "details" panel.

Archive Files

If enabled by the administrators, users can optionally disable the on-the-fly handling of archive files and also restrict the archive size limit imposed by the admins further.

Conversion of Individual Files

Optionally individual files (as opposed to directory trees and archives) can directly be converted to PDF. The default is to enable this feature. The drawback is that this adds an actions menu entry to each filesystem node, even to PDF files themselves.

Performance

  • Unfortunately, the app is not the fastest horse one could think of. In particular, the unoconv (LibreOffice) converter tends to be somewhat slow. Conversion time increases linearly with the number of files to be converted, of course.
  • It might be necessary to tweak your web server to allow for larger execution times (several minutes) if you do not want to make use of the background PDF generation.

Screenshots

Preferences

  • admin file_list
  • personal file_list

Files-List

  • directory file_list
  • archive file_list

Details-View

file_list

Other Nextcloud PDF Converters

At least two other apps are also either dedicated to or, respectively, allow for PDF conversion:

  • nextcloud/workflow_pdf_converter
    • this app is dedicated to automated PDF conversion based on workflow rules
    • at the time of this writing, conversion is done with LibreOffice
  • newroco/emlviewer
    • as the name states this is a viewer module for .eml files (emails)
    • the EML view also provides a PDF download button
    • at the time of this writing, PDF conversion is done with MPDF

Todo, some problems I am aware of

  • please feel free to submit issues!
  • ZIP-bomb detection might need improvement
  • There is no test suite. This is really an issue.