cancel
Showing results for 
Search instead for 
Did you mean: 

upgrade from 5.0.d to 5.1.e: pdfbox error

vincent-kali
Star Contributor
Star Contributor
Hi,
After an Alfresco update from 5.0.d to 5.1.e, everything seems to work fine except
that a pdfbox error prevent from indexing my pdf files:


ERROR [pdfbox.filter.FlateFilter] [http-bio-8443-exec-2] FlateFilter: stop reading corrupt stream due to a DataFormatException


Did somebody faced something similar ?
(I've seen that pdfbox fires this kind of message in case of 'out of memory', but server memory is not overloaded in my case)

Thanks for your advise,
Vincent
3 REPLIES 3

talleyrand
Champ in-the-making
Champ in-the-making
I have installed the latest version today and tried to migrate from 5.0d. I get a lot of this kind of messages and can't preview lots of documents too…

iblanco
Confirmed Champ
Confirmed Champ

I'm hanting for an indexing issue that is causing OutOfMemory errors and I'm starting to suspect that the culprit is PDFBox . My instalaltion is alfresco 5.1g. It uses PDFBox-1.8.10 and I found an issue in Tika that suggests that this version might not be a very good one:

 [TIKA-1737] PDFBox 1.8.10 is still a basket case - ASF JIRA.

I made a memory dump and I'm trying to analyze it with Eclipse MAT, the "Leak Suspects" report suggests that 75% of the heap is full of PdfBox's COSObjects that are being retained by a classloader.

Not sure how to interpret this but PDFBox seems to be in the middle.

mauro1855
Champ in-the-making
Champ in-the-making

Yes, I also have the same issue.

I had this problem in 5.0 and I continue having it in 5.1.e.

However I don't think there is a pdfbox version (alfresco-patched) more recent than 1.8.10. Meaning we have to continue using this version, right?

I'll try to dig a bit more