Does Alfresco Index E-mail Content?

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-24-2006 01:07 PM
Hi,
Need an answer to this question before I go and see a client this week so any help is much appreciated. The client saves their e-mails in outlook MSG format, and I have tried putting a test MSG file in Alfresco and trying to search on the emails content, but it doesn't seem to be. I also tried a plain text file and I couldn't search on the content for that either.
Does Alfresco index outlook MSG file contents? And what do I need to do to index the content of these files if not? Why didn't the plain text file contents get indexed? Do I need to set something up or do you have to wait a while for it to be indexed properly as its a background task?
Thanks,
David
Need an answer to this question before I go and see a client this week so any help is much appreciated. The client saves their e-mails in outlook MSG format, and I have tried putting a test MSG file in Alfresco and trying to search on the emails content, but it doesn't seem to be. I also tried a plain text file and I couldn't search on the content for that either.
Does Alfresco index outlook MSG file contents? And what do I need to do to index the content of these files if not? Why didn't the plain text file contents get indexed? Do I need to set something up or do you have to wait a while for it to be indexed properly as its a background task?
Thanks,
David
Labels:
- Labels:
-
Archive
8 REPLIES 8
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2006 04:38 PM
Alfresco will index any content that it can transform to UTF-8. If you want any content item to be full-text seachable, a transformation from that item's MIME type to UTF-8 must be registered with Alfresco. I suspect that MSG files are already transformable, and your problem is somewhere else (since plain text should work as well).
Regarding indexing, Alfresco will index content items (if you're using the default content model) as they get added to the repo within the same transaction. So indexing of the content is not done in the background by default.
If you'd like to learn more about when properties (content is a property) get indexed, refer to this article in the Wiki:
http://wiki.alfresco.com/wiki/Full-Text_Search_Configuration
Cheers.
–Sumer
Regarding indexing, Alfresco will index content items (if you're using the default content model) as they get added to the repo within the same transaction. So indexing of the content is not done in the background by default.
If you'd like to learn more about when properties (content is a property) get indexed, refer to this article in the Wiki:
http://wiki.alfresco.com/wiki/Full-Text_Search_Configuration
Cheers.
–Sumer

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2006 11:35 AM
Hi
I recall someone saying that the MS message format is actually RTF, so adding the RTF mimetype for documents with .msg might actually do what you want. Let us know if it works!
Cheers
Paul.
I recall someone saying that the MS message format is actually RTF, so adding the RTF mimetype for documents with .msg might actually do what you want. Let us know if it works!
Cheers
Paul.

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2006 12:12 PM
Thanks, I will give it a go and let you know the results.

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-30-2006 10:03 PM
Is there any update to this? I read in a RM document that v1.4 of Alfresco now extracts some meta-date from outlook msg files (to, from, subject). Is Alfresco now able to decompile the msg OLE file to extract additional information?
For example, being able to full-text index the body of the email would be incredibly useful. Extending this to indexing the attachments as well would be useful, but not as critical.
Cheers,
Al.
For example, being able to full-text index the body of the email would be incredibly useful. Extending this to indexing the attachments as well would be useful, but not as critical.
Cheers,
Al.

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2006 04:51 AM
Yes Alfresco uses the Apache POI library to deconstruct the Outlook email message OLE file format. Currently we only have a meta-data extractor class as you mention. We could (and should!) add the full-text extraction transformer based on the same code - it would not be hard to do but there wasn't time for it in 1.4.
If you fancy trying it, the code to look at is:
org.alfresco.repo.content.metadata.MailMetadataExtracter
Which shows how to extract fields (including the text body of the email message) from the email file. There are plenty of examples of text extractor classes in Alfresco (for Word, PDF etc.) that give a good starting point for adding your own.
Thanks,
Kevin
If you fancy trying it, the code to look at is:
org.alfresco.repo.content.metadata.MailMetadataExtracter
Which shows how to extract fields (including the text body of the email message) from the email file. There are plenty of examples of text extractor classes in Alfresco (for Word, PDF etc.) that give a good starting point for adding your own.
Thanks,
Kevin

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2006 03:26 AM
Hi Kevin,
I'll take a look at it.
I've been trying to test the metadata extractor before I start, but can't see / search on the extracted metadata (or for that matter confirm that anything has been indexed at all apart from the .msg file name). I'm running the 1.4 community release & have also tried building alfresco.war from svn HEAD. Do I need to turn anything on to get the metadata extractor working?
Cheers,
al.
I'll take a look at it.
I've been trying to test the metadata extractor before I start, but can't see / search on the extracted metadata (or for that matter confirm that anything has been indexed at all apart from the .msg file name). I'm running the 1.4 community release & have also tried building alfresco.war from svn HEAD. Do I need to turn anything on to get the metadata extractor working?
Cheers,
al.

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2006 05:37 AM
The email meta-data fields are part of the standard content model but not displayed by default. If the extraction occured correctly then the following aspect will have been populated:
The meta-data extractor will only work on Outlook ole2 format .msg file documents.
Thanks,
Kevin
<aspect name="cm:emailed"> <title>Emailed</title> <properties> <property name="cm:originator"> <title>Originator</title> <type>d:text</type> </property> <property name="cm:addressee"> <title>Addressee</title> <type>d:text</type> </property> <property name="cm:addressees"> <title>Addressees</title> <type>d:text</type> <multiple>true</multiple> </property> <property name="cm:subjectline"> <title>Subject</title> <type>d:text</type> </property> <property name="cm:sentdate"> <title>Sent Date</title> <type>d:datetime</type> </property> </properties> </aspect>
So you need to add the fields you require to your overriden client config to display them in the appopriate screens.The meta-data extractor will only work on Outlook ole2 format .msg file documents.
Thanks,
Kevin

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-12-2007 02:57 AM
Keep in mind that e-mail volume for any but the smallest organizations will be very large. Keeping up with the ingestion rate on that can be quite challenging. I'm not familiar yet with Alfresco's architecture, but it is the kind of thing that caused Documentum to have to rethink their meta-data model and change search engine vendor.
Travis
Travis
