cancel
Showing results for 
Search instead for 
Did you mean: 

Alf 32r2 - Pdfbox - Stop reading corrupt stream

dranakan
Champ on-the-rise
Champ on-the-rise
Hello,

I get a error message when I am uploading some PDF in Alfresco (with Mysql, 3.2r2)


ERROR [pdfbox.filter.FlateFilter] Stop reading corrupt stream
….

Looking in the src of pdfbox I have found :


}
                    catch (OutOfMemoryError exception)
                    {
                        // if the stream is corrupt an OutOfMemoryError may occur
                        log.error("Stop reading corrupt stream");
                    }
                    catch (ZipException exception)
                    {


This appears just after the installation (alfresco is clean). I have try to increase the memory of Alfresco (JAVA_OPTS…) and check with "top" that JVM has enough memory allocated but the message still come.

Does anyone has this problem too?
13 REPLIES 13

slowlearner
Champ in-the-making
Champ in-the-making
Hallo, thanks for the advice and apologies for taking so long to reply. I do now have 1.2 installed and yet the problem persists. The files that don't get indexed properly aren't corrupted in sense of not being pdf files, they seem normal enough. But i don't close the door on the input being somehow implicated… will look out for patterns. It does appear (subject to confirmation) that my problem pdf files (i.e. not properly indexed) are all from one source so far. Will keep this thread posted as i find out more.

slowlearner
Champ in-the-making
Champ in-the-making
Unfortunately i am still no closer to getting this resolved. So far…
    Updated pdfbox to 1.2.1 - Check
    Increased lucene.indexer.maxFieldLength - Check
    Recovered index from scratch on startup - Check
… and yet i can index only a few of the pdf documents uploaded. The problem docs come from various sources and are in all other respects, perfectly valid pdf documents.

lucille_arkenst
Champ in-the-making
Champ in-the-making
Has anybody come up with a solution?  I read another post where the person had success actually upgrading pdfbox.jar.  In this post, however, somebody has not had success with it.

If there is any Alfresco Engineer out there who is reading this, please help and advise.

sharifu
Confirmed Champ
Confirmed Champ
I am getting this in 4.0.d