cancel
Showing results for 
Search instead for 
Did you mean: 

Outlook msg extraction fail on Tika date format

loftux
Star Contributor
Star Contributor
I'm trying to get Outlook msg metadata extraction to work. It fails with
Caused by: org.alfresco.service.cmr.repository.datatype.TypeConversionException: Unable to convert string to date: Thu, 19 Feb 2009 11:17:09 +0100 (CET)
   at org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter.makeDate(AbstractMappingMetadataExtracter.java:899)
   at org.alfresco.repo.content.metadata.TikaPoweredMetadataExtracter.makeDate(TikaPoweredMetadataExtracter.java:166)
   at org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter.convertSystemPropertyValues(AbstractMappingMetadataExtracter.java:798)
The mail extractor is able to extract all metadata, it is just that the date isn't recognized.
This is a date format that from the error isn't supported. In TikaPoweredMetadataExtracter.java class there already is a bunch of additional date formats to be supported, but none seem to match the date format I've encountered.
I've tried to set -Duser.country=US -Duser.language=en in JAVA_OPTS, but that didn't change anything.
So is it outlook that has set the date format on the msg file? The msg file in question is from an Outlook client in an all Swedish environment.
If so, then no config change in Alfresco will be able to support this. Could we change the TikaPoweredMetadataExtracter class to be configurable, so that when you happen to be live in some obscure part of the world like sweden can extend with extra date formats?
3 REPLIES 3

loftux
Star Contributor
Star Contributor
It already was configurable, redefining bean extracter.Mail to look as extracter.RFC822
   <bean id="extracter.Mail" class="org.alfresco.repo.content.metadata.MailMetadataExtracter" parent="baseMetadataExtracter" >
      <property name="supportedDateFormats">
         <list>
            <value>EEE, d MMM yyyy HH:mm:ss Z</value>
            <value>EEE, d MMM yy HH:mm:ss Z</value>
         </list>
      </property>
   </bean>
and it just worked. This issue,  http://issues.alfresco.com/jira/browse/ALF-2716 logged and resolved helped me here.

mrogers
Star Contributor
Star Contributor
When you said Outlook msg.    Are you referring to the message/rfc822 mimetype (which normally has the extension .eml) or Outlook msg which has the mimetype of "application/vnd.ms-outlook" and normally has the extension msg.

loftux
Star Contributor
Star Contributor
I meant the application/vnd.ms-outlook (.msg) files.