Alf 32r2 - Pdfbox - Stop reading corrupt stream
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2009 04:28 AM
I get a error message when I am uploading some PDF in Alfresco (with Mysql, 3.2r2)
…ERROR [pdfbox.filter.FlateFilter] Stop reading corrupt stream….
Looking in the src of pdfbox I have found :
…} catch (OutOfMemoryError exception) { // if the stream is corrupt an OutOfMemoryError may occur log.error("Stop reading corrupt stream"); } catch (ZipException exception) {
This appears just after the installation (alfresco is clean). I have try to increase the memory of Alfresco (JAVA_OPTS…) and check with "top" that JVM has enough memory allocated but the message still come.
Does anyone has this problem too?
- Labels:
-
Archive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2010 03:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2010 04:34 PM
For example, if I scan a doc from our copier and send it to Alfresco, all is well. The PDF is indexed and a thumbnail is created in the Share site. If I open that PDF with Adobe Acrobat, make a change and re-save it, Alfresco throws an exception when I then move that file into the Share site. No thumbnail is created. In prior versions of Alfresco (< 3.2R2), Alfresco would eventually run out of memory if too many of these incompatible PDFs were encountered. This doesn't happen now, but we still see those exceptions.
Ben
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2010 05:53 AM
Yes, the problem should result from the PDF File.
Does anyone know a way to check if a PDF is wrong ? (and indicates what is wrong)
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-17-2010 04:10 PM
i just found that on my Alfresco setup this error eror occured on 13 of some 200 random PDF documents, so may i join the club?
Seriously, i consider this an major problem, for two reasons:
- As far as i understand PDFBOX, the decoding of the faulty PDFs is terminated at some random point WITH NO ERROR INDICATED TO THE CALLING CONVERTER, as the exceptions in org.apache.pdfbox.filter.FlateFilter are caught and converted into that innocent log message. Imagine your CxO not finding that important business report from last year for that reason… guess who gets kicked ass….
- and when i saw that OutOfMemoryException caught in PDFBOX, i'd liked to bang my head against the wall! WHEN I HAVE AN OUTOFMEMORYEXCEPTION IN MY APPLICATION, I WANT TO KNOW THAT!! I really have to know that, since the continued operation of my Alfresco is seriously in danger… arghhhh!
Well, i tried my luck with the current 1.0 snapshot from pdfbox.apache.org, but this was no better, so i'll propose to replace the PDFBOX converter with some external commandline tool…. i'll gonna post the configuration once it is working!
Cheers
Gyro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-19-2010 07:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2010 02:26 AM
Alfresco use my CPU to 100% from several days. I suspect a problem with this :
jstack (show java process)
"DefaultScheduler_Worker-3" prio=10 tid=0x08f8b400 nid=0x86b runnable [0x62b82000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:92) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86) - locked <0x71801578> (a sun.nio.ch.ChannelInputStream) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked <0x718055a0> (a java.io.BufferedInputStream) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked <0x718055c0> (a java.io.BufferedInputStream) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked <0x718055e0> (a java.io.BufferedInputStream) at java.io.FilterInputStream.read(FilterInputStream.java:66) at java.io.PushbackInputStream.read(PushbackInputStream.java:122) at org.apache.pdfbox.io.PushBackInputStream.read(PushBackInputStream.java:84) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:200) at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:870) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:141) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:213) at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:870) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:519) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:841) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:808) at org.alfresco.repo.content.transform.PdfBoxContentTransformer.transformInternal(PdfBoxContentTransformer.java:74) at org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(AbstractContentTransformer2.java:167) at org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(AbstractContentTransformer2.java:143) at org.alfresco.repo.search.impl.lucene.ADMLuceneIndexerImpl.indexProperty(ADMLuceneIndexerImpl.java:948) at org.alfresco.repo.search.impl.lucene.ADMLuceneIndexerImpl.createDocumentsImpl(ADMLuceneIndexerImpl.java:625) at org.alfresco.repo.search.impl.lucene.ADMLuceneIndexerImpl.createDocuments(ADMLuceneIndexerImpl.java:590) at org.alfresco.repo.search.impl.lucene.ADMLuceneIndexerImpl.updateFullTextSearch(ADMLuceneIndexerImpl.java:1569) at org.alfresco.repo.search.impl.lucene.fts.FullTextSearchIndexerImpl.index(FullTextSearchIndexerImpl.java:190) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy70.index(Unknown Source) at org.alfresco.repo.search.impl.lucene.fts.FTSIndexerJob.execute(FTSIndexerJob.java:52) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
Do you have same problem ?
(I have post my general problem here : http://forums.alfresco.com/en/viewtopic.php?f=8&t=21348#p82506)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-14-2010 11:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2010 03:47 AM
Most disturbing. Any help much appreciated.
Update:
I have also increased the lucene.indexer.maxfieldlength value to 1000000 and still get the problem. :x
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2010 06:20 AM
Previously, we had 79 PDFs that where not indexed, after the upgrade and reindexing only 10 remained unindexed! And eventually these 10 proved to be corrupt, for example there were JPEGs saved as PDF and the like 🙂
Cheers
Gyro