Lucene search Vs Solr search varies after upgrade
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2014 01:40 AM
Hello,
We just upgraded from 3.4.x to 4.2.x, and also move from Lucene to SOLR.
Talking about group search,
We have following example groups:
1) ABC
2) ABC_1
If I use following query in NodeBrowser, it returns both the above groups (4.2.x and SOLR)
However, in earlier version 3.4.x and Lucene,
It used to return exact match i.e. ABC
Does this mean SOLR makes Like search by default?
We have some groups with mixed case (ABC, AbC, ABc), aim is to use above query, and return groups with different case (It was working in Lucene)
Any tips, suggestions would be helpful.
Thanks
We just upgraded from 3.4.x to 4.2.x, and also move from Lucene to SOLR.
Talking about group search,
We have following example groups:
1) ABC
2) ABC_1
If I use following query in NodeBrowser, it returns both the above groups (4.2.x and SOLR)
+TYPE:"{http://www.alfresco.org/model/content/1.0}authorityContainer" AND +@\{http\://www.alfresco.org/model/content/1.0\}authorityName:'GROUP_ABC"
However, in earlier version 3.4.x and Lucene,
It used to return exact match i.e. ABC
Does this mean SOLR makes Like search by default?
We have some groups with mixed case (ABC, AbC, ABc), aim is to use above query, and return groups with different case (It was working in Lucene)
Any tips, suggestions would be helpful.
Thanks
Labels:
- Labels:
-
Archive
4 REPLIES 4
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-13-2014 06:42 AM
Hello,
between 3.4 and 4.2 there have been several significant improvements in search functionality and removal of some rather unfortunate hard-coded special cases. Search was not supposed to be case-sensitive before, but authorityName was a property that may have been handled differently in the past. Any searches you make via a query language are never guaranteed to return case-sensitive matches.
For your use case of finding a group by a particular name, using search does not seem to be the appropriate way. What about using the AuthorityService.getAuthorityNodeRef() / People.getGroup() / NodeService.getChildAssocsByPropertyValue() which can all be used to select on authorityName and will be return case-sensitive matches?
Regards
Axel
between 3.4 and 4.2 there have been several significant improvements in search functionality and removal of some rather unfortunate hard-coded special cases. Search was not supposed to be case-sensitive before, but authorityName was a property that may have been handled differently in the past. Any searches you make via a query language are never guaranteed to return case-sensitive matches.
For your use case of finding a group by a particular name, using search does not seem to be the appropriate way. What about using the AuthorityService.getAuthorityNodeRef() / People.getGroup() / NodeService.getChildAssocsByPropertyValue() which can all be used to select on authorityName and will be return case-sensitive matches?
Regards
Axel
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-14-2014 10:33 AM
Thanks Alex for quick response and your time.
Actually expected result should be case-insensitive but with exact phrase group name.
E.g.
1) APP_ALFRESCO
2) APP_alfresco
3) APP_ALFRESCO123
4) APP_ALFRESCO_XYZ
If I search for "APP_ALFRESCO", it should return,
APP_ALFRESCO and APP_alfresco
BUT, it returns all,
1) APP_ALFRESCO
2) APP_alfresco
3) APP_ALFRESCO123
4) APP_ALFRESCO_XYZ
Playing with AuthorityService, NodeService etc will return exact and not case-insensitive match I guess (Please correct if I am wrong)
Anything we can check in SOLR schema.xml and configure any related properties?
Actually expected result should be case-insensitive but with exact phrase group name.
E.g.
1) APP_ALFRESCO
2) APP_alfresco
3) APP_ALFRESCO123
4) APP_ALFRESCO_XYZ
If I search for "APP_ALFRESCO", it should return,
APP_ALFRESCO and APP_alfresco
BUT, it returns all,
1) APP_ALFRESCO
2) APP_alfresco
3) APP_ALFRESCO123
4) APP_ALFRESCO_XYZ
Playing with AuthorityService, NodeService etc will return exact and not case-insensitive match I guess (Please correct if I am wrong)
Anything we can check in SOLR schema.xml and configure any related properties?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2014 12:27 PM
Hello,
the reason APP_ALFRESCO also matches APP_ALFRESCO123 is due to tokenisation during indexing. It is part of the fuzzyness that FTS provides to provide matches even if the spelling is a bit different, e.g. between singular and plural forms of a term. This also affects identifiers that use non-alphanumeric characters and/or a mix of alpha and numeric characters.
Every non-alphanumeric character and each transition between alpha and numeric characters is a boundary at which indexing will create sub-terms for the input so far. E.g. APP_ALFRESCO_XYZ will be indexed using the terms "app", "appalfresco" and "appalfrescoxyz" and the search query will look for the term "appalfresco" which it finds for APP_ALFRESCO_XYZ.
If you need exact matches either use the AuthorityService, NodeService etc. (because those do proper database queries) or do a post-query check of the results in your code.
Regards
Axel
the reason APP_ALFRESCO also matches APP_ALFRESCO123 is due to tokenisation during indexing. It is part of the fuzzyness that FTS provides to provide matches even if the spelling is a bit different, e.g. between singular and plural forms of a term. This also affects identifiers that use non-alphanumeric characters and/or a mix of alpha and numeric characters.
Every non-alphanumeric character and each transition between alpha and numeric characters is a boundary at which indexing will create sub-terms for the input so far. E.g. APP_ALFRESCO_XYZ will be indexed using the terms "app", "appalfresco" and "appalfrescoxyz" and the search query will look for the term "appalfresco" which it finds for APP_ALFRESCO_XYZ.
If you need exact matches either use the AuthorityService, NodeService etc. (because those do proper database queries) or do a post-query check of the results in your code.
Regards
Axel
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2014 03:13 AM
Thanks Alex once again for detailed information.
Fir now, I did post query check of the results.
Time to explore Solr!
Fir now, I did post query check of the results.
Time to explore Solr!
data:image/s3,"s3://crabby-images/4dc34/4dc34129a881ffd3012054b5215b54451a749d30" alt=""