cancel
Showing results for 
Search instead for 
Did you mean: 

Transform or thumbnailing scanned PDF - results a white page

louise
Champ in-the-making
Champ in-the-making
I've found a thumbnail generation problem with Alfresco 3.1 Labs and 3.2 Community editions with using scanned documents.

ImageMagick and GhostScript can create thumbnails from that PDF document, but Alfresco 3.x doesn't use ImageMagick to transform - just see it in PdfToImageContentTransformer.java

Example scan: Scan-TEST_error_to_transform.pdf (scanned with Minolta Di351)
1 REPLY 1

schaffy
Champ in-the-making
Champ in-the-making
Responding in an effort to say that you are not alone.

This appears to be caused by primarily that ImageMagick is not being used for PDF thumbnail production.  Well… moving on then…

Secondarily this is caused by the change of Adobe's PDF specification starting with version 1.5.  They changed the cross reference table from a simple byte parsed table to a bit stream type format.

Referring to http://www.adobe.com/devnet/pdf/pdfs/PDFReference15_v6.pdf, section 3.4.7:

3.4.7 Cross-Reference Streams
Beginning with PDF 1.5, cross-reference information may be stored in a cross-reference
stream, instead of a cross-reference table. Cross-reference streams provide
the following advantages:
• A more compact representation of cross-reference information.
• The ability to access compressed objects that are stored in object streams (see
Section 3.4.6, “Object Streams”), and to allow new cross-reference entry types
to be added in the future.

… a little later on…

Note that the value following the startxref keyword is now the offset of the crossreference
stream rather than an xref keyword. For files that use cross-reference
streams entirely (that is, PDF 1.5 files that are not hybrid-reference files; see
“Compatibility with PDF 1.4” on page 85), the keywords xref and trailer are no
longer used. Therefore, with the exception of the “startxref address %%EOF” segment
and comments, a PDF 1.5 file is entirely a sequence of objects.

So as you can imagine, the problem occurs when pdf-render library is trying to read stream code as byte code (overgeneralization, but hopefully the point is clear)

Two possible solutions are:
- Modify the generally abandoned pdf-render library (https://pdf-renderer.dev.java.net/) to get it updated for PDF versions greater than 1.4 (src/com/sun/pdfview/PDFFile.java for the file to specifically modify)
- Modify transformInternal within PdfToImageContentTransformer.java to use ImageMagick instead of the generally abandoned pdf-render library

Probably time to end this post now and go submit a ticket  :wink:

For reference:
Mac OS 10.5.8
Alfresco Community 3.2
ImageMagick  6.5.5-10 (2009/09/14)
pdf2swf - swftools 0.9.0
OpenOffice 3.0.0 [300m6(Build:9352)]