cancel
Showing results for 
Search instead for 
Did you mean: 

Create compressed PDF from PDF

loftux
Star Contributor
Star Contributor
I need to make compressed versions of a PDF:s available. The original PDF are scanned documents with very high resolution (they need to be since they are historical documents that are not always of best quality to start with) and therefor very large file sizes. Now I need to make compressed version of those files available for preview and download for faster page load.

My thinking is that I can use a new thumbnail definition for this.
For the compression I can use ghostscript like this http://www.alfredklomp.com/programming/shrinkpdf/ and put this into a transformer similar to the transformer.worker.Pdf2swf or transformer.worker.ImageMagick found in thirdparty subsystem.

But how do I go from there and tell the thumbnail definitions bean to use that transformer?
In the thumbnail definition I can set the target mimetype, but in this case it is the same (application/pdf).
3 REPLIES 3

loftux
Star Contributor
Star Contributor
I found that one can use imagemagick to compress pdf with
convert -compress jpeg -density 100 input.pdf output.pdf 

My line of thought then was that i could use this for a new thumbnail definition


   <bean id="loftux.thumbnail.pdfcompressed" class="org.springframework.beans.factory.config.MethodInvokingFactoryBean">
      <property name="targetObject" ref="thumbnailRegistry" />
      <property name="targetMethod" value="addThumbnailDefinition" />
      <property name="arguments">
         <list>
            <bean class="org.alfresco.repo.thumbnail.ThumbnailDefinition">
               <property name="name" value="pdfcompressed" />
               <property name="mimetype" value="application/pdf"/>
               <property name="transformationOptions">
                  <bean parent="defaultImageTransformationOptions">
                     <property name="commandOptions">
                        <value>-compress jpeg -density 100 ${source} ${target}</value>
                     </property>
                  </bean>
               </property>
               <property name="runAs" value="System"/>
               <property name="failureHandlingOptions" ref="standardFailureOptions"/>
            </bean>
         </list>
      </property>
   </bean>


This does not work unfortunately. I get the error
The content node was not specified so the content cannot be streamed to the client…
So something is wrong with my bean definition I guess.

loftux
Star Contributor
Star Contributor
To clarify why I want to use thumbnails,
The files are scanned historical documents, and to preserve details they are scanned in very high resolution. That makes the pdf files anywhere from 10Mb and up.
In this case we use the Share Extras Media Viewer and the PdfJS viewer in there. For preview a compressed and therefor smaller versions is preferred and enough (no need for high res details in preview).

So the thumbnailing service and rest api makes it very convenient to retrieve a rendition, and you always get a rendition of the latest version. This is actually what is used in the Media Viewer for PdfJS, a new thumbnail type is defined, named pdf, and any document that is not pdf retrieves that (if transformation is possible).

So why not use use the rendition service?
Yes, maybe that is better in this case. But I've found very little documentation about the rendition service and how that relates to transformations. When trying to decipher how the api works it still looks like you only can use it when you transform from one mime type to another. Any hints are welcome. What I am looking for is a way to keep original pdf and serve a compressed version for quick view, browsing needs.
Also, I'm not very skilled in java and hoped it could be achieved with config and javascript. But that doesn't look like the way forward now.

rasm
Champ in-the-making
Champ in-the-making
- Have you change the call from PDF.js to get the 'pdfcompressed' rendition?
- How are you testing?

- I think Alfresco does not do PDF->PDF transformations, so when it is matching its transformers to get to mimetype: 'application/pdf'. It stops at the source in your case, which is correct mimetype.