cancel
Showing results for 
Search instead for 
Did you mean: 

How the Indexer works

stegbth
Champ in-the-making
Champ in-the-making
Hi,

i am Running Alfresco-Labs 3 Stable on Debian 4.0 x86_64 with Mysql5 and Tomcat Bundle.

Currently i am testing with Abby Finereader and create text below picture pdf's and saving them with the CIFS in Alfresco.
Alfresco indexes some word's other's not, on what does that depend?

greetings
thomas
3 REPLIES 3

nvir
Champ in-the-making
Champ in-the-making
Hello,

If you want to see what's indexed, you may use two ways:
- use Luke (Lucene tool which helps you to see you index content)
- see the txt files generated in the tomcat temp/Alfresco directory

Greetings

wmay
Champ in-the-making
Champ in-the-making
We have the same problem and i found out after some tests that there is a problem when adding such a document by CIFS. When you add such an OCRed PDF via CIFS or when you add such a document via the Web-Client or an other client to Alfresco the search results are different.

The document added by CIFS can not found with the same search string like the same document added via the client - often (maybe allways) it helps when you make a wildcard search and when you e.g. search for  vienna you have to use *vienna* to find the document. Also the search for phrases using e.g. "vacation in vienna" does not works with such PDF´s added by CIFS but the same search function works correct if this file was added via the web-client.

This is a very strange problem which causes a lot of confusion during the tests and use of the system because one of the most important things of such a ECM - "E" for enterprise should be to be able to find the documents you added to the repository - and not for xx% but allways for 100%.

See also here http://forums.alfresco.com/en/viewtopic.php?f=3&t=19701 or
http://forums.alfresco.com/en/viewtopic.php?f=16&t=17306 or
http://forums.alfresco.com/en/viewtopic.php?f=9&t=19341&start=0

Where can be the problem

- problem with language settings we use a german XP to access Alfresco CIFS and add PDF files via drag & drop and we use "english" as selected language for the Web-Client ?
- problem with the PDF to text extraction
- ……

Same problem with enterprise 3.0 and Community 3.2 version.

Any ideas how to solve this ? there seems to be some topics in the forum regarding this problem.

nvir
Champ in-the-making
Champ in-the-making
The idea is to see what's in the lucene index with Luke (http://www.getopt.org/luke/) when the document is added through the web interface and then when you add it through CIFS.

Have also a look in the metadata of the document through the Alfresco node explorer, and checks the field content (which contains something about the language which may be used by the indexer).

Hope this helps,
Alain