cancel
Showing results for 
Search instead for 
Did you mean: 

Tuning of solr.maxBooleanClauses parameter

mlagneaux
Champ on-the-rise
Champ on-the-rise
Hello,

I'm not sure this is the right section for this post. I'm sorry if it is not.
I'm working on Alfresco 4.2.1. My search engine is SolR and i'm facing the error described in JIRA ticket MNT-11905.

In JIRA ticket, they said that increasing solr.maxBooleanClauses parameter will fix the error but it can also have an impact on performance.

I'd like to know what are the Boolean Clauses and how they are generated when using a wildcard in the Lucene request.
Moreover, is it possible to know how many Boolean clauses are generated in a request in order to determine maxBooleanClauses parameter value.

Thank you in advance for your help.
1 REPLY 1

afaust
Legendary Innovator
Legendary Innovator
Hello,

a Boolean clause is a low level fragment of a SOLR query when your Alfresco FTS request has been processed into the native query structure. Such a clause can evaluate to true / false and depending on your FTS query may be a mandatory (AND/NOT) or optional part (OR) of the query. Any time you use a wildcard, e.g. "alfre*", SOLR checks the index for terms that start with "alfre" and transforms your FTS query for "alfres*" into a collection of Boolean clauses that contains a clause for every term. So if you have 20000 different terms/words in the index in the field you are querying that start with "alfre" you end up with 20000 OR-ed Boolean clauses for just using "alfre*".

There is no reliable way to pre-determine how many Boolean clauses are generated without knowing about the composition of the index. There also is a very strong correlation with the nature of the content you are maintaining and how structured your metadata is. As long as you use natural languages in the content you are managing, you should be fine with the default limit of Boolean clauses (provided you DO use a prefix and just don't search for "*"). The more numerical, abbreviated, technical or machine generated content you manage, the more likely it will be that you either have to increase the limit or force users to limit their wildcard use / require them to provide longer prefixes.

Regards
Axel