cancel
Showing results for 
Search instead for 
Did you mean: 

How does Alfresco's content indexing work?

siquser
Champ in-the-making
Champ in-the-making
We uploaded MSWord and MSExcel documents.  When I search for the text that are within these documents, SEARCH does not show any result

Not that we are saying SEARCH does not work, b'cos we have tested SEARCH for TEXT data & it has worked in the past, and also we have searched for text from the MSWord / MSExcel file in the past & it has worked for us.

What we are un-sure is, how long does it take for the INDEXING server to kick-in once the file is uploaded. In our case it was a very small file & data within the file is very minimal, still SEARCH does not feed the result & we have been waiting for 20-30 minutes since the time we uploaded the file.  We grabbed the content of this MSWord file & uploaded the content as TEXT file & then searched, the result was instanteneous.

Question:  Is there any configuration, that says index the file right-away or index every <n> minute, that we can tweak?
30 REPLIES 30

javauser007
Champ in-the-making
Champ in-the-making
Hi siquser,
me also facing the same problem. i installed open office 3.0 on top of alfresco but still it is not searching the content of
ma-office 2007 documents.

did u resolve it?

if, plz provide the details…

thanks in advance

siquser
Champ in-the-making
Champ in-the-making
I remember having a conversation with Alfresco Engineer (on personal email thread), per him its a KNOWN BUG, hopefully Alfresco Team will resolve this issue ASAP
Only solution that I can think of is, convert the dcoument to Office 2003 servsion, upload it, then SEARCH should work

syed_imtiaz
Champ in-the-making
Champ in-the-making
You are right. Office 2007 files need to be saved as (word/excel/powerpoint)97-2003 format. Then search is instantenous.

Regards

benswitzer
Champ in-the-making
Champ in-the-making
The file types for OpenXML documents (MS Office 2007) and missing from alfresco/WEB-INF/classes/alfresco/mimetype/openoffice-document-formats.xml.

You obviously need a version of OpenOffice that supports OpenXML documents (version 3.0.0 does)

If I were you, I'd search if this issue has already been reported at http://issues.alfresco.com and otherwise file the bug.
:roll:  :idea:  :idea:  Smiley Tongue  :evil:

t_broyer
Champ in-the-making
Champ in-the-making
1. We have OpenOffice 3.0 installed on the server where Alfresco installed and I had mentioned about the same back on 16 Dec 2008 (please refer forum thread above)

Did you add the docx/xlsx/pptx in openoffice-document-formats.xml where, as I said, they are missing? (implying that having OpenOffice 3.0 is just *part* of the solution)

That being said, I do not have an MSOffice 2007 to test with, so I can just guess.

javauser007
Champ in-the-making
Champ in-the-making
Hi broyer,
I have added all the configurations related to ms-office 2007 in openoffice-document-formats.xml
But still facing the same problem (I already installed openoffice 3.0 on top of alfresco 2.9B).

the following is a snipet of code which i added in openoffice-document-formats.xml

<document-format><name>Microsoft Word 2007</name>
<family>Text</family>
<mime-type>application/vnd.openxmlformats-officedocument.wordprocessingml.document</mime-type>
<file-extension>docx</file-extension>
<export-filters>
<entry><family>Text</family><string>MS Word 2007</string></entry>
</export-filters>
</document-format>



Any help is appriciated…

thanks..

javauser007
Champ in-the-making
Champ in-the-making
can anybody solve this problem?

mikeh
Star Contributor
Star Contributor
In content-services-context.xml try adding this supportedMimetypes config to the extracter.OpenOffice bean:

<bean id="extracter.OpenOffice"    class="org.alfresco.repo.content.metadata.OpenOfficeMetadataExtracter"    parent="baseMetadataExtracter" >
   <property name="connection">
      <ref bean="openOfficeConnection" />
   </property>
   <property name="supportedMimetypes">
   <list>
        <value>application/msword</value>
      <value>application/vnd.excel</value>
      <value>application/vnd.powerpoint</value>
      <value>application/vnd.openxmlformats-officedocument.wordprocessingml.document</value>
      <value>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</value>
      <value>application/vnd.openxmlformats-officedocument.presentationml.presentation</value>
</list>
   </property>
</bean>

Mike

t_broyer
Champ in-the-making
Champ in-the-making
Hi broyer,
I have added all the configurations related to ms-office 2007 in openoffice-document-formats.xml
But still facing the same problem (I already installed openoffice 3.0 on top of alfresco 2.9B).

Er, Alfresco 2.9B… Are the docx/xslx/pptx entries present in alfresco/WEB-INF/classes/alfresco/mimetype/mimetype-map-openoffice.xml?

javauser007
Champ in-the-making
Champ in-the-making
Hi broyer,
docx/xslx/pptx entries were not present in 2.9B, and also not in Enterprise 3.0 (mimetype-map-openoffice.xml).
Then how to make alfresco to read docx files…?


thanks..