upgrade from 5.0.d to 5.1.e: pdfbox error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2016 05:37 AM
After an Alfresco update from 5.0.d to 5.1.e, everything seems to work fine except
that a pdfbox error prevent from indexing my pdf files:
ERROR [pdfbox.filter.FlateFilter] [http-bio-8443-exec-2] FlateFilter: stop reading corrupt stream due to a DataFormatException
Did somebody faced something similar ?
(I've seen that pdfbox fires this kind of message in case of 'out of memory', but server memory is not overloaded in my case)
Thanks for your advise,
Vincent
- Labels:
-
Archive

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2016 09:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2017 01:04 PM
I'm hanting for an indexing issue that is causing OutOfMemory errors and I'm starting to suspect that the culprit is PDFBox . My instalaltion is alfresco 5.1g. It uses PDFBox-1.8.10 and I found an issue in Tika that suggests that this version might not be a very good one:
[TIKA-1737] PDFBox 1.8.10 is still a basket case - ASF JIRA.
I made a memory dump and I'm trying to analyze it with Eclipse MAT, the "Leak Suspects" report suggests that 75% of the heap is full of PdfBox's COSObjects that are being retained by a classloader.
Not sure how to interpret this but PDFBox seems to be in the middle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2017 04:05 AM
Yes, I also have the same issue.
I had this problem in 5.0 and I continue having it in 5.1.e.
However I don't think there is a pdfbox version (alfresco-patched) more recent than 1.8.10. Meaning we have to continue using this version, right?
I'll try to dig a bit more
