cancel
Showing results for 
Search instead for 
Did you mean: 

How does Alfresco's content indexing work?

siquser
Champ in-the-making
Champ in-the-making
We uploaded MSWord and MSExcel documents.  When I search for the text that are within these documents, SEARCH does not show any result

Not that we are saying SEARCH does not work, b'cos we have tested SEARCH for TEXT data & it has worked in the past, and also we have searched for text from the MSWord / MSExcel file in the past & it has worked for us.

What we are un-sure is, how long does it take for the INDEXING server to kick-in once the file is uploaded. In our case it was a very small file & data within the file is very minimal, still SEARCH does not feed the result & we have been waiting for 20-30 minutes since the time we uploaded the file.  We grabbed the content of this MSWord file & uploaded the content as TEXT file & then searched, the result was instanteneous.

Question:  Is there any configuration, that says index the file right-away or index every <n> minute, that we can tweak?
30 REPLIES 30

t_broyer
Champ in-the-making
Champ in-the-making
What we are un-sure is, how long does it take for the INDEXING server to kick-in once the file is uploaded.

It's (almost) instantaneous (though done asynchronously: your don't have to wait for the indexing to be finished to continue using the app).
In the context of Alfresco Explorer (/alfresco webapp), as soon as the server sent you the web page, indexing is running.

In our case it was a very small file & data within the file is very minimal, still SEARCH does not feed the result & we have been waiting for 20-30 minutes since the time we uploaded the file.  We grabbed the content of this MSWord file & uploaded the content as TEXT file & then searched, the result was instanteneous.

Did you install and configure OpenOffice on your server? Alfresco uses OpenOffice to convert the office files (including MSOffice documents) to plain text.
Obviously, a plain text file does not need such a transformation, hence the positive results.

siquser
Champ in-the-making
Champ in-the-making
It's been over a day now from the time i uploaded the MS Office Files, and I still don't see my file as result of Search, so I'm pretty curious understanding what i'm missing

My files were MS-Office files & NOT Open Office files, un-sure how installing Open Office makes a difference. Please help me understand.  BTW, I do have Open Office installed

Am I missing something?

mikeh
Star Contributor
Star Contributor
We use OpenOffice on the server because it does an excellent job of reading Microsoft Office documents and saving them as PDF - no need to reinvent the wheel!  Smiley Very Happy

You need to ensure the Alfresco server can connect to OpenOffice on startup. Check the log for details.

Mike

_sax
Champ in-the-making
Champ in-the-making
As I mentioned over here, there is also an issue with 3.0 and searching for me
http://forums.alfresco.com/en/viewtopic.php?f=9&t=15814.
Could this be related to a specific OpenOffice version one should have? I used 2.0 x64 which worked well with Alfresco 2.1 and 2.9b.

siquser
Champ in-the-making
Champ in-the-making
Still no luck with Search Result:

We are using Open Office 3.0; as said earlier, we were uploading MS-WORD & MS-Excel file so i'm not sure what is the relationship between Microsoft Files & Open Office Files, those are 2 seperate file types and there should be no dependencies. WHich means SEARCH should not be impacted.

Do we have to install MS-Office on our Alfresco server, I dont think we need to, but just checking.

t_broyer
Champ in-the-making
Champ in-the-making
as said earlier, we were uploading MS-WORD & MS-Excel file so i'm not sure what is the relationship between Microsoft Files & Open Office Files, those are 2 seperate file types and there should be no dependencies. WHich means SEARCH should not be impacted.

it's not about Open Office files (Open Document Format actually, an OASIS standard, nothing to do with Open Office except that Open Office uses this format by default) but Open Office, the office suite, which is able to open MS Office files and save them into other formats; that is, Open Office can be used to transform an MSOffice file into a plain text file.

…and that's what Alfresco does under the hood: it calls an Open Office instance and asks it to transform your MS Office file into a plain text file; which one is then passed to Lucene for indexing.

That being said, I'm not enough of an expert in Alfresco installation to help you; but probably Alfresco cannot launch and connect to the Open Office.

siquser
Champ in-the-making
Champ in-the-making
1. We tried SEARCH on all MS-Office 2003 Documents. It works for Word, Excel, Visio. Does not work for Power-point

2. We tried SEARCH on all MS-Office 2007 Documents. It DOES NOT work

Any idea, if there is any specific issue with MS-Office 2007?

t_broyer
Champ in-the-making
Champ in-the-making
The file types for OpenXML documents (MS Office 2007) are missing from alfresco/WEB-INF/classes/alfresco/mimetype/openoffice-document-formats.xml.

You obviously need a version of OpenOffice that supports OpenXML documents (version 3.0.0 does)

If I were you, I'd search if this issue has already been reported at http://issues.alfresco.com and otherwise file the bug.

siquser
Champ in-the-making
Champ in-the-making
Hi t.broyer

1. We have OpenOffice 3.0 installed on the server where Alfresco installed and I had mentioned about the same back on 16 Dec 2008 (please refer forum thread above)
2. Looks like same problem has been reported by "_sax" on 14 Dec 2008  (please refer forum thread above)

I will try searching more on "http://issues.alfresco.com" to see if this bug has been already reported/recorded

Thanks