cancel
Showing results for 
Search instead for 
Did you mean: 

Lucene search Vs Solr search varies after upgrade

nikes
Champ on-the-rise
Champ on-the-rise
Hello,

We just upgraded from 3.4.x to 4.2.x, and also move from Lucene to SOLR.

Talking about group search,

We have following example groups:
1) ABC
2) ABC_1

If I use following query in NodeBrowser, it returns both the above groups (4.2.x and SOLR)

+TYPE:"{http://www.alfresco.org/model/content/1.0}authorityContainer" AND +@\{http\://www.alfresco.org/model/content/1.0\}authorityName:'GROUP_ABC"


However, in earlier version 3.4.x and Lucene,
It used to return exact match i.e. ABC

Does this mean SOLR makes Like search by default?


We have some groups with mixed case (ABC, AbC, ABc), aim is to use above query, and return groups with different case (It was working in Lucene)

Any tips, suggestions would be helpful.

Thanks
4 REPLIES 4

afaust
Legendary Innovator
Legendary Innovator
Hello,

between 3.4 and 4.2 there have been several significant improvements in search functionality and removal of some rather unfortunate hard-coded special cases. Search was not supposed to be case-sensitive before, but authorityName was a property that may have been handled differently in the past. Any searches you make via a query language are never guaranteed to return case-sensitive matches.

For your use case of finding a group by a particular name, using search does not seem to be the appropriate way. What about using the AuthorityService.getAuthorityNodeRef() / People.getGroup() / NodeService.getChildAssocsByPropertyValue() which can all be used to select on authorityName and will be return case-sensitive matches?

Regards
Axel

nikes
Champ on-the-rise
Champ on-the-rise
Thanks Alex for quick response and your time.


Actually expected result should be case-insensitive but with exact phrase group name.

E.g.
1) APP_ALFRESCO
2) APP_alfresco
3) APP_ALFRESCO123
4) APP_ALFRESCO_XYZ

If I search for "APP_ALFRESCO", it should return,
APP_ALFRESCO and APP_alfresco

BUT, it returns all,
1) APP_ALFRESCO
2) APP_alfresco
3) APP_ALFRESCO123
4) APP_ALFRESCO_XYZ

Playing with AuthorityService, NodeService etc will return exact and not case-insensitive match I guess (Please correct if I am wrong)

Anything we can check in SOLR schema.xml and configure any related properties?

afaust
Legendary Innovator
Legendary Innovator
Hello,

the reason APP_ALFRESCO also matches APP_ALFRESCO123 is due to tokenisation during indexing. It is part of the fuzzyness that FTS provides to provide matches even if the spelling is a bit different, e.g. between singular and plural forms of a term. This also affects identifiers that use non-alphanumeric characters and/or a mix of alpha and numeric characters.
Every non-alphanumeric character and each transition between alpha and numeric characters is a boundary at which indexing will create sub-terms for the input so far. E.g. APP_ALFRESCO_XYZ will be indexed using the terms "app", "appalfresco" and "appalfrescoxyz" and the search query will look for the term "appalfresco" which it finds for APP_ALFRESCO_XYZ.

If you need exact matches either use the AuthorityService, NodeService etc. (because those do proper database queries) or do a post-query check of the results in your code.

Regards
Axel

nikes
Champ on-the-rise
Champ on-the-rise
Thanks Alex once again for detailed information.

Fir now, I did post query check of the results.

Time to explore Solr!