A while ago, Alfresco decided to replace the Ghostscript engine in our products. It has been used as a rasteriser to transform PDF files to PNG images within Alfresco Content Services (ACS). The main cause was due to Ghostscript’s change to an AGPL license, which caused some concerns among our customers and limited us in the way how we could distribute ACS.
Alfresco Engineering was tasked to evaluate different options for PDF rendition under a permissive open source license. Unfortunately, we found almost no independent performance and fidelity comparison between the different engines out there — especially no study that is thorough enough to base such a decision on. Since the new engine will be used as the default in our next version of ACS, it would have the potential to cause severe problems for our customers if it fails either due to poor performance or poor fidelity.
In the end, we decided to do this study ourselves. With this blog post, we want to share our results with you.
After research on Wikipedia and Google, our team came up with this list of PDF rendering engines:
Engine | License | Notes |
---|---|---|
Ghostscript | AGPLv3 or Commercial | Full PostScript interpreter. Can also handle PDF files. |
MuPDF | AGPLv3 or Commercial | PDF, XPS and EPUB rendering engine based on the modern high performance Fitz graphics engine |
Adobe PDF Library SDK | Commercial | Original Adobe PDF engine. |
Foxit SDK | Commercial | Engine behind the Foxit PDF reader products. Fork released under BSD license as Pdfium by Google. |
Pdfium | BSD style | Engine behind PDF Plug-In in Chrome. Fork of the Foxit SDK. |
Poppler | GPLv2 or GPLv3 | Fork of XPdf |
Xpdf | GPLv2 or Commercial | PDF viewer for X-Windows & PDF Rasterizer for all platforms (pdftopng). |
GnuPDF | GPLv3 | |
PDFBox 2.0 | Apache 2.0 | |
Sejda | AGPLv3 or Commercial | |
IcePDF | Apache 2.0 and Commercial Pro version | |
Aspose PDF | Commercial |
(Note: There are multiple PDF Reader products out there, but this is the consolidated list of PDF rendering engines behind these products)
Given our constraints on license terms and other factors, we decided to start our deep and thorough investigation with this list of candidates. We included some leading proprietary libraries for reference.
Engine | Type | Version |
---|---|---|
Ghostscript | Native | 9.21 |
MuPDF | Native | 1.10a |
Xpdf | Native | 3.04 |
Pdfium | Native | 2017-04-10 |
Aspose | Java | 17.2.0 |
ICEPdf | Java | 6.2.0 |
Sejda | Java | 3.0.13 |
PDFBox | Java | 2.0.5 |
The version numbers are the latest released version at the time of our investigation.
Ghostscript has been used until ACS 5.1 to render all PDF files to PNG images for things like thumbnails. Most other file formats, like MS Office files, are converted to PDF first for previewing and then the PDF is converted to PNG. In our analysis, we have been interested in the average overall performance.
Our team randomly picked 3071 PDF documents (17,226 pages) from our internal Alfresco Repository to get a sample set representing a typical ACS repository. We are aware that there are ACS installations out there that mainly contain documents of one specific kind, but we are confident that our sample set represents the majority of ACS repositories.
To compare the performance, we rendered all documents to PNG files at 100 dpi. Each engine was configured to produce results at comparable fidelity (e.g. by activating anti-aliased text and graphics) and we guaranteed the same resources to each engine. For all Java based engines, we kept the JVM running and invoked the rendition for each document in the same JVM process, enabling Hotspot to best optimise the generated native code.
This led to the following total process times:
(total rendition time; smaller is better)
We played with different dpi settings to see how the results would change. We found that the relative difference between engines is affected by the resolution (dpi), but the order in which the candidates ranked stayed the same across different dpi settings (except for close candidates).
Our key findings are:
Because our new transformation engine will be used for a server based rendition of PDF documents to PNG files, all interactive features like form filling, signature validation, video or 3D were out of scope for our investigation.
Based on the latest PDF specification, we compiled a list of features that are provided by the PDF drawing model. For each feature, we picked a sample document for testing or created such a document ourselves. The rendition of these sample documents was then visually compared and rated. We used the latest Adobe Acrobat Reader as our reference viewer.
Text rendition & font support | Ghostscript | MuPDF | Xpdf | Pdfium | Aspose | ICEPdf | Sejda | PDFBox |
---|---|---|---|---|---|---|---|---|
Type1 | 4 | 5 | 4 | 5 | 1 | 1 | 4 | 4 |
TrueType | 4 | 5 | 4 | 5 | 4 | 0 | 5 | 5 |
Type1 CID | yes | yes | yes | yes | yes | no | yes | yes |
TrueType CID | yes | yes | yes | yes | yes | no | yes | yes |
Type3 | 3 | 5 | 6 | 6 | 1 | 0 | 1 | 1 |
AVG | 3.67 | 5 | 4.67 | 5.33 | 2 | 0 | 3.33 | 3.33 |
We awarded 0 to 5 points for each rendition compared to the Acrobat reference. In two situations, we awarded an extra point for visually better results than Acrobat.
(click to enlarge)
PDF supports 6 different “filters” (i.e. compression formats) that can be used to store raster graphic data. The decoded raster graphic then needed to be mapped to the pixels of the rendered output graphic. This process has a huge impact on the final visual result.
Images | Ghostscript | MuPDF | Xpdf | Pdfium | Aspose | ICEPdf | Sejda | PDFBox |
Anti Aliasing | no | yes | yes | yes | no | partial | no | no |
CCITTFaxDecode | yes | yes | yes | yes | yes | yes | yes | yes |
DCTDecode | yes | yes | yes | yes | yes | yes | yes | yes |
LZWDecode | yes | yes | yes | yes | yes | yes | yes | yes |
FlatDecode | yes | yes | yes | yes | yes | yes | yes | yes |
JPXDecode | yes | yes | yes | yes | no | partial | no | no |
JBIG2Decode | yes | yes | yes | yes | yes | yes | partial | partial |
SUM Image | 3 | 5 | 5 | 5 | 2 | 2 | 1 | 1 |
Every engine starts with 5 points. We removed 1 point for each missing filter support and we removed 2 points for non-working anti aliasing.
We noticed that the atomic drawing operations (MoveTo, LineTo, CurveTo) and shading models are supported almost equally in all candidates. Visually different results on complex drawings are mainly caused by the composition of these atomic building blocks, and not by the basic operations themselves.
This is why we decided to focus on the composition of drawing operations for our comparison.
PDF supports 16 different blend modes. These can either be applied to single objects or to multiple objects in a transparency group. Each blend mode affects the image channels individually and thus produces different results in different colour spaces (RGB, CMYK).
We used a set of test and reference PDF files (link these two for RGB and CMYK) and counted the errors made by each engine. We then used the relative number of errors to assign points, with 5 points being awarded for the fewest errors averaged over all test files and 0 points representing lots of errors. Here are the final results:
Blend Modes | Ghostscript | MuPDF | Xpdf | Pdfium | Aspose | ICEPdf | Sejda | PDFBox |
AVG | 3.33 | 3.33 | 3 | 2 | 1.67 | 1 | 2 | 2 |
Combining all of the above gives the following results for features and fidelity:
(rendition fidelity; larger is better)
MuPDF came out of this investigation as the clear winner, followed by Pdfium second. It also became apparent that there is big gap between native PDF renderers and the group of Java based PDF renderers — considering performance as well as features and fidelity.
We ended up selecting PDFium as the PDF rasterization engine for these reasons:
Based on the Pdfium library, we started a new project: the alfresco-pdf-renderer. This native command line program is inspired by the test application used within the Pdfium builds. But it does not provide support for JavaScript and offers additional parameters to specify the size of the output image.