Using Alfresco FullText Search as Case Insensitive

cyong — Tue, 09 Jul 2019 15:12:42 GMT

Greetings, my apologies in advance as I am more of a SQL Developer, and I am not that experienced with the Alfresco API. But I have been trying to solve this, and haven't gotten very far. We are on version 5.2.2.We have some files with words in titles like "Estimating" and "Estimate". Currently the

Re: Using Alfresco FullText Search as Case Insensitive

afaust — Wed, 10 Jul 2019 08:07:20 GMT

One thing to keep in mind with Alfresco searches is, that unless an FTS query is suitable to be directly transformed to and executed by an SQL statement, all queries will handled by SOLR / Lucene and be subject to its tokenisation / word stemming rules, which work differently then a LIKE condition in SQL. Another thing to keep in mind is that queries may be handled quite differently depending on the locale of the search request vs. the locale of the document on which the metadata resides - the metadta on the document will be tokenised / indexed using the locale stored as part of the metadata, while the query will be tokenised using the locale (implicitly) provided in the request. As a result, some fuzzy match conditions may not match (even if they should obviously do so in the user's eyes) because the locales have slightly different tokenisation / stemming rules in SOLR / Lucene.

So, using the = prefix turns off fuzzy matching and case insensitivity, turning the term more into an SQL LIKE condition with asterisk playing the part of the percent wildcard. A SQL LIKE is also always case sensitive and the only way to turn it into an insensitive query is by using value manipulation functions (like LOWER) in the condition, which you simply cannot do in FTS. If written even more restrictive (e.g. without a leading wildcard), a query with = prefix could actually be executed directly via SQL.

The query cm:title:Estima did not find any of the results you were looking for because the word stemming / tokenisation rules for English would not allow the term "Estima" to match either "Estimate" or "Estimating".

The query cm:title:Estimate finds both 'Estimate' and 'Estimating' because the word stemming / tokenisation rules for English cause the term "Estimate" to matche words based on "Estimate" using common English word derivation rules, for which "Estima" was insufficient.

If you would use the query cm:title:Estima* it should work like you want it to, as this will add "begins-with" condition rule to the basic stemming / tokenisation evaluations.

Re: Using Alfresco FullText Search as Case Insensitive

cyong — Thu, 11 Jul 2019 13:21:31 GMT

Hmmm, I think I understand what you are saying about the word stemming and tokenisation. However I confirmed that the documents are in english (as indicated by sys:locale property), as is the search API request. Yet, when I search for cm:title:estima* (lowercase) I am unable to find the documents that start with Estima.

Reading https://github.com/Alfresco/alfresco-repository/blob/af2e069b2eabcd5433cee39d83ec06bad6fc69a0/src/main/resources/alfresco/model/contentModel.xml it looks like the 5.2 cm:title field should be tokenized?

topic Re: Using Alfresco FullText Search as Case Insensitive in Alfresco Forum

Using Alfresco FullText Search as Case Insensitive

Re: Using Alfresco FullText Search as Case Insensitive

Re: Using Alfresco FullText Search as Case Insensitive