cancel
Showing results for 
Search instead for 
Did you mean: 

search in Excel document

josua49
Champ in-the-making
Champ in-the-making
Hi,

The product is really interesting to share documents.
I've tried with the Alfresco V1 on a tomcat server to search excel content : no results.
It's as if excel's files are not indexed.

Why ?
8 REPLIES 8

derek
Star Contributor
Star Contributor
Hi,

Do you have any ERROR messages on the console when you start up?
Do you get any hits if you perform a search for "nint"? [not indexed - no transformer]
What about if you search for "nift"? (will be 'nitf' after V1.0) [not indexed - transformation failed]

Regards

josua49
Champ in-the-making
Champ in-the-making
Hi,

I use an apache tomcat installation with alfresco.war but not the tomcat bundle.
I'm a newbie with tomcat… and I've search in logs files any error concerning alfresco : nothing !
When I've performed a search for "nint", and I had no hit.
But when I've performed a search for "nift", it returned my Excel document.
So it means that the search process is not able to convert excel files to index it ? the file size is about 462kb

Regards.

derek
Star Contributor
Star Contributor
Hi,

The transformation failed.  This is not normal, but the POI libraries, which we use to read the Excel documents, fail to handle a certain number of Excel documents.

There is an open bug regarding this type of failure: It might be that we have to use something completely different to perform the text conversion.  Unfortunately, POI is much faster than the alternatives.  You can attach your file to the bug if it doesn't contain sensitive info: http://www.alfresco.org/jira/browse/AR-114.  It probably won't be closed until POI handles the documents or some other alternative comes along.

One of those alternatives is the UnoContentTransformer.  Apart from being slower, it also only extracts the first sheet's text.

If there is something else we can use in Java to read XLS files, then feel free to suggest it.  In the meantime, if you want to switch to the OpenOffice converter, comment out the POI transformer in the contentTransformerRegistry, found in content-services-context.xml.  The (in my opinion) less reliable transformation provided by the OpenOffice converter will kick in as long as you started the background OpenOffice server.


      <!– transformers to fall back on in the event that an explicit transformation isn't defined –>
      <property name="transformers">
         <list>
            <ref bean="transformer.StringExtracter" />
            <ref bean="transformer.BinaryPassThrough" />
            <ref bean="transformer.PdfBox" />
<!–            <ref bean="transformer.Poi" />  –>
            <ref bean="transformer.TextMining" />
            <ref bean="transformer.HtmlParser" />
            <ref bean="transformer.OpenOffice" />
<!–            <ref bean="transformer.JMagick" /> –>
            <ref bean="transformer.ImageMagick" />
         </list>
      </property>

Regards

josua49
Champ in-the-making
Champ in-the-making
Thank you for your help.

I've commented out the POI transformer, and restarted Tomcat.(I don't know how to restart OpenOffice server ??? I have only OpenOffice installed on my windows xp).

The search in my excel file is still unsuccessful.

If there is something to try with OpenOffice I've missed, tell me.

Regards.

kevinr
Star Contributor
Star Contributor
To start the Open Office server execute the following batch file found in the same directory as the other Alfresco start scripts:

zstart_oo.bat

If that runs ok, then re-start the Alfresco server and try the excel doc again.

Thanks,

Kevin

josua49
Champ in-the-making
Champ in-the-making
Hi,

I've try your solution, but it is unsuccessfull.
I think I'm going to upload the excel file to your bug report.

Thank you for your help.

kevinr
Star Contributor
Star Contributor
Yes please do so, we will take a look at the document and try to track down the bug.

Thanks for your help testing,

Kevin

steve
Champ in-the-making
Champ in-the-making
Hello,
I have used your file to import into a test instance here and it does not get indexed.
Also, there is a big stack trace output, so I have added that to the bug report.

Thanks for bringing this to our attention,
Steve