[Index & Search] Removing effects of language / locale

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-29-2008 08:44 AM
Hello,
First, I need to say that I read a lot about Indexing process, Searching process and effects of languages on these.
I read the following topics, and a few others :
http://forums.alfresco.com/en/viewtopic.php?f=4&t=10114&hilit=search+and+locale
and
http://forums.alfresco.com/en/viewtopic.php?f=9&t=9524&hilit=search+and+locale
My problem is that, with users that can be from different countries/languages, and that mix CIFS and webclient usage (for file uploading of file searching), the results of any search process are unefficient. I mean : they like the functionning of CIFS/windows search.
Indeed, they've got a lot of troubles getting the right result using webclient or a search portlet (via webservice), because of all stemming/analyzing procedures that are lead during the indexing process.
So, I'd like to configure a simple indexing anlysis, that would just erase any accents (for french and spanish words), but keep the words unstemmed.
If the users look for "procedure", they want to find files containing "procédure" of even "ProCéDUre", whatever the locale of their webclient, the locale of the document, or the way the file was uploaded.
Iwas wondering if it was as simply as
- declaring the same LuceneCustomAnalyzer in the DataTypeAnalyzers_locale.properties
- Creating this LuceneCustomAnalyzer from the French one, removing the call to FrenchStemmer, and customizing it in order to erase accents.
Am I right on this way to do it ?
Is there anything I forgot (like the fact that doing so, any search for "procedureS" (plural) will not show files with "procedure" (singular) ?
Thank you all
First, I need to say that I read a lot about Indexing process, Searching process and effects of languages on these.
I read the following topics, and a few others :
http://forums.alfresco.com/en/viewtopic.php?f=4&t=10114&hilit=search+and+locale
and
http://forums.alfresco.com/en/viewtopic.php?f=9&t=9524&hilit=search+and+locale
My problem is that, with users that can be from different countries/languages, and that mix CIFS and webclient usage (for file uploading of file searching), the results of any search process are unefficient. I mean : they like the functionning of CIFS/windows search.
Indeed, they've got a lot of troubles getting the right result using webclient or a search portlet (via webservice), because of all stemming/analyzing procedures that are lead during the indexing process.
So, I'd like to configure a simple indexing anlysis, that would just erase any accents (for french and spanish words), but keep the words unstemmed.
If the users look for "procedure", they want to find files containing "procédure" of even "ProCéDUre", whatever the locale of their webclient, the locale of the document, or the way the file was uploaded.
Iwas wondering if it was as simply as
- declaring the same LuceneCustomAnalyzer in the DataTypeAnalyzers_locale.properties
- Creating this LuceneCustomAnalyzer from the French one, removing the call to FrenchStemmer, and customizing it in order to erase accents.
Am I right on this way to do it ?
Is there anything I forgot (like the fact that doing so, any search for "procedureS" (plural) will not show files with "procedure" (singular) ?
Thank you all
Labels:
- Labels:
-
Archive
2 REPLIES 2

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-29-2008 09:59 AM
Hummm, I'm wondering whether to use FrenchAnalyzer (without FrenchStemmer) et IsoLatin1Filter, or AlfrescoStandardAnalyzer.
One last question.
When I'm done with the config changes, will a full reindexing process (index.recovery.mode=FULL) rebuild the indexes taking account of this change about analyzers ?
Thank you for any reply
One last question.
When I'm done with the config changes, will a full reindexing process (index.recovery.mode=FULL) rebuild the indexes taking account of this change about analyzers ?
Thank you for any reply

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2008 09:59 AM
Hi
Yes a full index rebuild will apply any analyzer changes.
Andy
Yes a full index rebuild will apply any analyzer changes.
Andy
