cancel
Showing results for 
Search instead for 
Did you mean: 

Indexing of PDF files and EMail messages

marian
Champ in-the-making
Champ in-the-making
Hi,

I have installed alfresco community 2.1 and have been playing with it for
a few days.

Apparently in my installations the bodies of PDF files and those of EMail-Messages
(as exported from MS Outlook in the form of .msg-files) does not work.

I can upload a msg file and the author is filled from the sender of the
message and description is filled from the subject of the message.
The message can then be found using searches for words from these
fields, but not from the body.

The mail shows up in the 'nitf' special-search. So apparently the
transformation failed. There is no indication of such a failure in the log (on
the console as I started alfresco with the batch).

Is indexing of the mail body possible at all? Do I need to configure
something to make it work? What can I configure to debug the failure
reported?

The same is true for PDF files: I can upload PDFs and title and author are
prepopulated. The document is not returned for searches on content. It is
also not returned from either of nitf, nicm or nint searches.

My document is very simple and small, contains only simple text and has
been created from MS word through a ghostcript-based PDF-Printer. The
word-DOC itself is correctly indexed.

Any ideas on how to debug this issue or pointers to further reading on the
system are very much appreciated.

Ciao, MM
12 REPLIES 12

kevinr
Star Contributor
Star Contributor
I've just tested some .msg files here on 2.1 and it works ok for me (the body of the mail is indexed and searchable). I assume you are saving in the Outlook email format not html format.

The 'nitf' code means an exception occured. If that is the case you should see the following in the Alfresco log:

Unable to extract text from message: …

You need at least WARN log level on - but that is the default anyway. If you are not seeing that then i'm not sure what the problem is.

Do no .msg emails work at all?

Kevin

savah
Champ in-the-making
Champ in-the-making
Greetings all,


I have tried saving some emails with Outlook 2003 both as .msg and as .msg - unicode and although it recognises the MIME type as email, search for keywords for the body does not work and also when I tried to hit the preview button Alfresco throwed an exception 500: Failed to execute method SendNodeInfoBean.sendNodeInfo: Unknown exception in transaction.


The strange thing is that one of my colleagues tried to save the exact same email as .msg using Outlook 2000 and it worked fine.


Any comments on that ?

Kind regards,


Paris Kapsouros

kevinr
Star Contributor
Star Contributor
That is strange - as I'm using Outlook2003 here for .msg file testing.

Is there any more info in the logs with the full exception trace?

Kevin

savah
Champ in-the-making
Champ in-the-making
Dear Kevin,

Thanks a lot for your reply!

I have already created a thread about this issue we're having here http://forums.alfresco.com/viewtopic.php?t=10579

Anyway, I am pasting a full log trace here.

Once again thanks a lot for your feedback.

16:26:33,968 ERROR [alfresco.ajax] Failed to execute method NodeInfoBean.sendNodeInfo: Unknown Exception in Transaction.
org.alfresco.error.AlfrescoRuntimeException: Unknown Exception in Transaction.
   at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:292)
   at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:155)
   at org.alfresco.web.app.servlet.ajax.InvokeCommand.execute(InvokeCommand.java:167)
   at org.alfresco.web.app.servlet.ajax.AjaxServlet.service(AjaxServlet.java:148)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
   at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)
   at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
   at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
   at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
   at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
   at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
   at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
   at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
   at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
   at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
   at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
   at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
   at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at org.alfresco.web.app.servlet.ajax.InvokeCommand$1.execute(InvokeCommand.java:163)
   at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:225)
   … 18 more
Caused by: org.alfresco.service.cmr.repository.TemplateException: Error during processing of the template 'Content conversion failed:
   reader: ContentAccessor[ contentUrl=store://2008/1/22/16/24/cc1c9505-c8f5-11dc-bcd0-d1fa6a7067aa.bin, mimetype=message/rfc822, size=39936, encoding=UTF-8, locale=el_GR]
   writer: ContentAccessor[ contentUrl=store://2008/1/22/16/26/0835b63e-c8f6-11dc-bcd0-d1fa6a7067aa.bin, mimetype=text/plain, size=3525, encoding=UTF-8, locale=el_GR]
   options: {}'. Please contact your system administrator.
   at org.alfresco.repo.template.FreeMarkerProcessor.process(FreeMarkerProcessor.java:204)
   at org.alfresco.repo.processor.TemplateServiceImpl.processTemplate(TemplateServiceImpl.java:177)
   at org.alfresco.repo.processor.TemplateServiceImpl.processTemplate(TemplateServiceImpl.java:107)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:281)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:187)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:154)
   at org.alfresco.repo.security.permissions.impl.AlwaysProceedMethodInterceptor.invoke(AlwaysProceedMethodInterceptor.java:40)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke(ExceptionTranslatorMethodInterceptor.java:49)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.alfresco.repo.audit.AuditComponentImpl.auditImpl(AuditComponentImpl.java:256)
   at org.alfresco.repo.audit.AuditComponentImpl.audit(AuditComponentImpl.java:191)
   at org.alfresco.repo.audit.AuditMethodInterceptor.invoke(AuditMethodInterceptor.java:69)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:107)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:210)
   at $Proxy94.processTemplate(Unknown Source)
   at org.alfresco.web.bean.ajax.NodeInfoBean.sendNodeInfo(NodeInfoBean.java:92)
   … 24 more
Caused by: org.alfresco.service.cmr.repository.ContentIOException: Content conversion failed:
   reader: ContentAccessor[ contentUrl=store://2008/1/22/16/24/cc1c9505-c8f5-11dc-bcd0-d1fa6a7067aa.bin, mimetype=message/rfc822, size=39936, encoding=UTF-8, locale=el_GR]
   writer: ContentAccessor[ contentUrl=store://2008/1/22/16/26/0835b63e-c8f6-11dc-bcd0-d1fa6a7067aa.bin, mimetype=text/plain, size=3525, encoding=UTF-8, locale=el_GR]
   options: {}
   at org.alfresco.repo.content.transform.AbstractContentTransformer.transform(AbstractContentTransformer.java:255)
   at org.alfresco.repo.content.transform.AbstractContentTransformer.transform(AbstractContentTransformer.java:210)
   at org.alfresco.repo.content.RoutingContentService.transform(RoutingContentService.java:468)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:281)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:187)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:154)
   at net.sf.acegisecurity.intercept.method.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:80)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.alfresco.repo.model.ml.MLContentInterceptor.invoke(MLContentInterceptor.java:129)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke(ExceptionTranslatorMethodInterceptor.java:49)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.alfresco.repo.audit.AuditComponentImpl.audit(AuditComponentImpl.java:238)
   at org.alfresco.repo.audit.AuditMethodInterceptor.invoke(AuditMethodInterceptor.java:69)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:107)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
   at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:210)
   at $Proxy24.transform(Unknown Source)
   at org.alfresco.repo.template.BaseContentNode$TemplateContentData.getContentAsText(BaseContentNode.java:478)
   at org.alfresco.repo.template.CropContentMethod.exec(CropContentMethod.java:64)
   at freemarker.core.MethodCall._getAsTemplateModel(MethodCall.java:93)
   at freemarker.core.Expression.getAsTemplateModel(Expression.java:89)
   at freemarker.core.Assignment.accept(Assignment.java:90)
   at freemarker.core.Environment.visit(Environment.java:196)
   at freemarker.core.MixedContent.accept(MixedContent.java:92)
   at freemarker.core.Environment.visit(Environment.java:196)
   at freemarker.core.ConditionalBlock.accept(ConditionalBlock.java:79)
   at freemarker.core.Environment.visit(Environment.java:196)
   at freemarker.core.MixedContent.accept(MixedContent.java:92)
   at freemarker.core.Environment.visit(Environment.java:196)
   at freemarker.core.Environment.process(Environment.java:176)
   at freemarker.template.Template.process(Template.java:232)
   at org.alfresco.repo.template.FreeMarkerProcessor.process(FreeMarkerProcessor.java:200)
   … 46 more
Caused by: org.alfresco.service.cmr.repository.ContentIOException: Property set stream: \__nameid_version1.0__substg1.0_10000102
   at org.alfresco.repo.content.transform.MailContentTransformer$1.processPOIFSReaderEvent(MailContentTransformer.java:96)
   at org.apache.poi.poifs.eventfilesystem.POIFSReader.processProperties(POIFSReader.java:259)
   at org.apache.poi.poifs.eventfilesystem.POIFSReader.processProperties(POIFSReader.java:228)
   at org.apache.poi.poifs.eventfilesystem.POIFSReader.read(POIFSReader.java:95)
   at org.alfresco.repo.content.transform.MailContentTransformer.transformInternal(MailContentTransformer.java:110)
   at org.alfresco.repo.content.transform.AbstractContentTransformer.transform(AbstractContentTransformer.java:246)
   … 83 more
Caused by: org.alfresco.service.cmr.repository.ContentIOException: Failed to open stream onto channel:
   writer: ContentAccessor[ contentUrl=store://2008/1/22/16/26/0835b63e-c8f6-11dc-bcd0-d1fa6a7067aa.bin, mimetype=text/plain, size=3525, encoding=UTF-8, locale=el_GR]
   at org.alfresco.repo.content.AbstractContentWriter.getContentOutputStream(AbstractContentWriter.java:390)
   at org.alfresco.repo.content.AbstractContentWriter.putContent(AbstractContentWriter.java:465)
   at org.alfresco.repo.content.transform.MailContentTransformer$1.processPOIFSReaderEvent(MailContentTransformer.java:90)
   … 88 more
Caused by: org.alfresco.service.cmr.repository.ContentIOException: A channel has already been opened
   at org.alfresco.repo.content.AbstractContentWriter.getWritableChannel(AbstractContentWriter.java:239)
   at org.alfresco.repo.content.AbstractContentWriter.getContentOutputStream(AbstractContentWriter.java:383)
   … 90 more

kevinr
Star Contributor
Star Contributor
I believe this is a bug - it has been fixed:
http://issues.alfresco.com/browse/CHK-1219

Thanks,

Kevin

sacco
Champ in-the-making
Champ in-the-making
Hi Kevin

When you guys post links into the JIRA, I don't think people outside Alfresco can get access to any of the Check-in messages.

Even when I'm signed into the JIRA, all I can see for CHK-??? links is:

ERROR

It seems that you have tried to perform an operation which you are not permitted to perform.

If you think this message is wrong, please consult your administrators about getting the necessary permissions.

kevinr
Star Contributor
Star Contributor
Sorry about that, yes i forgot the CHK project is internal.

Here is the info:
   
Fixed AR-1713: Transformers that do nothing don't break full text indexing
Fixed MailContentTransformer to write an empty string to the ContentWriter if there is no mail message body.

r6625 | derekh | 2007-08-29 12:46:20 +0100 (Wed, 29 Aug 2007) | 3 lines
Changed paths:
   M /alfresco/BRANCHES/V2.1/root/projects/repository/source/java/org/alfresco/repo/content/transform/MailContentTransformer.java
   M /alfresco/BRANCHES/V2.1/root/projects/repository/source/java/org/alfresco/repo/search/impl/lucene/ADMLuceneIndexerImpl.java

So it's been fixed on the 2.1E branch - and should have been merged over to HEAD already. So will be in the current 2.9 release or nightly build.

Kevin

savah
Champ in-the-making
Champ in-the-making
Hi Kevin,


Downloaded yesterday's nightly build and the above mentioned bug with the outlook email seems to be fixed.


Discovered some others though such as Alfresco throwing an NullPointer exception when we tried to add a category to content but i guess thats something we can expect from a nightly build.  :wink:


Again thanks a lot for your reply!


Paris Kapsouros

kevinr
Star Contributor
Star Contributor
Glad to hear that fixed it. Feel free to report any new ones you find against 2.9 in JIRA, it's appreciated!

Cheers,

Kevin