cancel
Showing results for 
Search instead for 
Did you mean: 
stefankopf
Confirmed Champ
Confirmed Champ

A while ago, Alfresco decided to replace the Ghostscript engine in our products. It has been used as a rasteriser to transform PDF files to PNG images within Alfresco Content Services (ACS). The main cause was due to Ghostscript’s change to an AGPL license, which caused some concerns among our customers and limited us in the way how we could distribute ACS.

Alfresco Engineering was tasked to evaluate different options for PDF rendition under a permissive open source license. Unfortunately, we found almost no independent performance and fidelity comparison between the different engines out there — especially no study that is thorough enough to base such a decision on. Since the new engine will be used as the default in our next version of ACS, it would have the potential to cause severe problems for our customers if it fails either due to poor performance or poor fidelity.

In the end, we decided to do this study ourselves. With this blog post, we want to share our results with you.

Market Overview

After research on Wikipedia and Google, our team came up with this list of PDF rendering engines:

EngineLicenseNotes
GhostscriptAGPLv3 or CommercialFull PostScript interpreter. Can also handle PDF files.
MuPDFAGPLv3 or CommercialPDF, XPS and EPUB rendering engine based on the modern high performance Fitz graphics engine
Adobe PDF Library SDKCommercialOriginal Adobe PDF engine.
Foxit SDKCommercialEngine behind the Foxit PDF reader products.
Fork released under BSD license as Pdfium by Google.
PdfiumBSD styleEngine behind PDF Plug-In in Chrome.
Fork of the Foxit SDK.
PopplerGPLv2 or GPLv3Fork of XPdf
XpdfGPLv2 or CommercialPDF viewer for X-Windows & PDF Rasterizer for all platforms (pdftopng). 
GnuPDFGPLv3
PDFBox 2.0Apache 2.0
SejdaAGPLv3 or Commercial
IcePDFApache 2.0 and Commercial Pro version
Aspose PDFCommercial

(Note: There are multiple PDF Reader products out there, but this is the consolidated list of PDF rendering engines behind these products)

The Candidates

Given our constraints on license terms and other factors, we decided to start our deep and thorough investigation with this list of candidates. We included some leading proprietary libraries for reference.

EngineTypeVersion
GhostscriptNative9.21
MuPDFNative1.10a
XpdfNative3.04
PdfiumNative2017-04-10
AsposeJava17.2.0
ICEPdfJava6.2.0
SejdaJava3.0.13
PDFBoxJava2.0.5

The version numbers are the latest released version at the time of our investigation.

Performance

Ghostscript has been used until ACS 5.1 to render all PDF files to PNG images for things like thumbnails. Most other file formats, like MS Office files, are converted to PDF first for previewing and then the PDF is converted to PNG. In our analysis, we have been interested in the average overall performance.

Our team randomly picked 3071 PDF documents (17,226 pages) from our internal Alfresco Repository to get a sample set representing a typical ACS repository. We are aware that there are ACS installations out there that mainly contain documents of one specific kind, but we are confident that our sample set represents the majority of ACS repositories.
To compare the performance, we rendered all documents to PNG files at 100 dpi. Each engine was configured to produce results at comparable fidelity (e.g. by activating anti-aliased text and graphics) and we guaranteed the same resources to each engine. For all Java based engines, we kept the JVM running and invoked the rendition for each document in the same JVM process, enabling Hotspot to best optimise the generated native code.
This led to the following total process times:

Rendition times of different PDF engines

(total rendition time; smaller is better)

We played with different dpi settings to see how the results would change. We found that the relative difference between engines is affected by the resolution (dpi), but the order in which the candidates ranked stayed the same across different dpi settings (except for close candidates).

Our key findings are:

  • Native engines are always faster than Java based engines
  • MuPDF is clearly the fastest engine
  • Pdfium comes in second, but is significantly slower than MuPDF

Features  and Fidelity

Because our new transformation engine will be used for a server based rendition of PDF documents to PNG files, all interactive features like form filling, signature validation, video or 3D were out of scope for our investigation.

Based on the latest PDF specification, we compiled a list of features that are provided by the PDF drawing model. For each feature, we picked a sample document for testing or created such a document ourselves. The rendition of these sample documents was then visually compared and rated. We used the latest Adobe Acrobat Reader as our reference viewer.

Text rendition and font support

Text  rendition & font support

Ghostscript

MuPDF

Xpdf

Pdfium

Aspose

ICEPdf

Sejda

PDFBox

Type1

4

5

4

5

1

1

4

4

TrueType

4

5

4

5

4

0

5

5

Type1 CID

yes

yes

yes

yes

yes

no

yes

yes

TrueType CID

yes

yes

yes

yes

yes

no

yes

yes

Type3

3

5

6

6

1

0

1

1

AVG

3.67

5

4.67

5.33

2

0

3.33

3.33

We awarded 0 to 5 points for each rendition compared to the Acrobat reference. In two situations, we awarded an extra point for visually better results than Acrobat.

(click to enlarge)

Images

PDF supports 6 different “filters” (i.e. compression formats) that can be used to store raster graphic data. The decoded raster graphic then needed to be mapped to the pixels of the rendered output graphic. This process has a huge impact on the final visual result.

Images

Ghostscript

MuPDF

Xpdf

Pdfium

Aspose

ICEPdf

Sejda

PDFBox

Anti Aliasing

no

yes

yes

yes

no

partial

no

no

CCITTFaxDecode

yes

yes

yes

yes

yes

yes

yes

yes

DCTDecode

yes

yes

yes

yes

yes

yes

yes

yes

LZWDecode

yes

yes

yes

yes

yes

yes

yes

yes

FlatDecode

yes

yes

yes

yes

yes

yes

yes

yes

JPXDecode

yes

yes

yes

yes

no

partial

no

no

JBIG2Decode

yes

yes

yes

yes

yes

yes

partial

partial

SUM Image

3

5

5

5

2

2

1

1

Every engine starts with 5 points. We removed 1 point for each missing filter support and we removed 2 points for non-working anti aliasing.

Drawing model

We noticed that the atomic drawing operations (MoveTo, LineTo, CurveTo) and shading models are supported almost equally in all candidates. Visually different results on complex drawings are mainly caused by the composition of these atomic building blocks, and not by the basic operations themselves.

This is why we decided to focus on the composition of drawing operations for our comparison.

Compositing and Blend Modes

PDF supports 16 different blend modes. These can either be applied to single objects or to multiple objects in a transparency group. Each blend mode affects the image channels individually and thus produces different results in different colour spaces (RGB, CMYK).
We used a set of test and reference PDF files (link these two for RGB and CMYK) and counted the errors made by each engine. We then used the relative number of errors to assign points, with 5 points being awarded for the fewest errors averaged over all test files and 0 points representing lots of errors. Here are the final results:

Blend Modes

Ghostscript

MuPDF

Xpdf

Pdfium

Aspose

ICEPdf

Sejda

PDFBox

AVG

3.33

3.33

3

2

1.67

1

2

2

Fidelity results

Combining all of the above gives the following results for features and fidelity:

(rendition fidelity; larger is better)

Conclusion

MuPDF came out of this investigation as the clear winner, followed by Pdfium second. It also became apparent that there is big gap between native PDF renderers and the group of Java based PDF renderers — considering performance as well as features and fidelity.

We ended up selecting PDFium as the PDF rasterization engine for these reasons:

  • The BSD-style license of Pdfium gives us and our customers the most flexibility
  • Since this engine drives the PDF display in Chrome, we expect a very good and continued support from Google for this library, especially when it comes to finding and fixing vulnerabilities
  • It shows a very good overall performance, although it is not the fastest engine in the test
  • It shows very good rendition fidelity

The Alfresco PDF Renderer

Based on the Pdfium library, we started a new project: the alfresco-pdf-renderer. This native command line program is inspired by the test application used within the Pdfium builds. But it does not provide support for JavaScript and offers additional parameters to specify the size of the output image.