cancel
Showing results for 
Search instead for 
Did you mean: 

Lucene query and it's performance results

kdejaeger
Champ in-the-making
Champ in-the-making
Hello,

I don't understand the performance differences of the following queries.
First we started of with this query (1) :
PATH:"/app:company_home/cm:A_Company/cm:A_Space//*" 
This query (2) includes a custom property check and was equal in performance:
PATH:"/app:company_home/cm:A_Company/cm:A_Space//*" AND @eb\:ebCategory:"Invoice"
This query (3) returns the same results as query 2 but was 15 times faster:
PATH:"/app:company_home/cm:A_Company/cm:A_Space//*" AND TYPE:"{eb.model}invoice" AND @eb\:ebCategory:"Invoice"
This query (4) returns also the same results as query 2 but was 30 times faster:
TYPE:"{eb.model}invoice" AND @eb\:ebCategory:"Invoice"

Questions :
Query 2 : why aren't we gaining performance when adding a custom property check
Query 3 : why is a type check suddenly so faster (what happens technically within lucene/alfresco?)
Query 4 : removing the specific path improves performance. why is the path decreasing performance? is it because of the //* expansion? without a specific path is not an option for the project
5 REPLIES 5

nyronian
Champ in-the-making
Champ in-the-making
I have no answer but that is very interesting.  I do alot of custom searches where I use the Path quite a bit so I am very interested in this.  Does anyone at Alfresco have the answers to this?

pmonks
Star Contributor
Star Contributor
I'm by no means a Lucene expert, but wildcards always give me pause - that would be my guess as to why the queries that include the PATH field are slower.

Cheers,
Peter

nyronian
Champ in-the-making
Champ in-the-making
Not to highjack the question kdejaeger but I have questions around it as well, hopefully my comments help as well.

I assume kdejaeger, the reason you are indicating a path is to find "invoices" of a particular company, not all invoices in the system.

I have the same issue, I am looking for content under a particular company, so I indicate the path.  The * is only to to indicate looking for all children under the path.  Otherwise you may have undesirable results and get nodes you do not wish.

Is there a another way to acheive the same results?

kdejaeger
Champ in-the-making
Champ in-the-making
Yes nyronian, that's exactly why I need the path to be there. I somehow think that the //* is doing a slower xpath expansion. We need some professional advise here from an alfresco engineer to know how this gets treated.  :wink:

andy
Champ on-the-rise
Champ on-the-rise
Hi

If you let me nkow the version of Alfresco the explanation is different for older versions ….

Paths with //* do the structural query in two parts - the directory and then the leaves.
(it is too expensive to reindex a doc each time its parents structiure may change in lucene as you may do for structural queries using an XML database)

So /blah/blah//* finds all directories that match the path and then re-queries to find all children. The requery is ordered and about as good as it can gets, although not grouped which may help. It basically depends on the number of directories found.

Lucene shuffles which predicte is used to scan through the parts of a conjunctions. However, the PATH work is done up front so should not have any effect.
It is done for all of your queries.

So it is most likely you are seeing caching improvements to performance.

Do you queries either all cold or all warm (ignoring the first result and running each at least twice).
Other background processes can also have an efect.

Andy