cancel
Showing results for 
Search instead for 
Did you mean: 

Incorrect and partial total result by lucene query

epaoletti
Champ in-the-making
Champ in-the-making
Hi,

We are running the following lucene query via action

query = "PATH:\"/app:company_home/cm:Test/cmSmiley Very Happyocument//*\" AND @\\{http\\://www.alfresco.org/model/content/1.0\\}content.mimetype:application/pdf";
So we would like to get the total number of pdf files under the "company_home/Test/Document" space.

We had the following issue:
– the first results of the query was not complete. we had not the correct total number
– so we run again the query until we get the correct total number of pdf files

So the query seems to return the correct total number only after we run it more than one time (times depend on number of PDF files in the space).
for example: if we have about 5.000 files
the first query shows total=1.500 files
the second one shows total=2.410 files
………
Until the last one shows=5.000 files
then the queries became stable and result is 5.000.

There is some configuration parameters for the Lucene query to avoid this problems ?
We would like to have the correct total number at the first query.

Thanks for your help
11 REPLIES 11

derek
Star Contributor
Star Contributor
Add to your custom-repository.properties (this comes from repository.properties) and increase as appropriate.
#
# Properties to limit resources spent on individual searches
#
# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=10000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=1000

epaoletti
Champ in-the-making
Champ in-the-making
Ok, We already done it,

The problems was that we increased the value but the time to wait for the query result was too long (about 60 seconds).
The time is too long for WEB interactive use.
 
At the moment, We have a total of 10.000/15.000 pdf documents to search.
I thinks it is not a critical number for Alfresco.
What's happen if we 'll manage 100.000 documents ?

Have you some suggestions in order to set the variables in the correct way  to manage
10.000/15.000 documents ?
The machine is:
CPU Intel dual core 2,66 GHZ
Memory 2GB RAM

# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=??????
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=????

Thanks in advance for your Help

derek
Star Contributor
Star Contributor
15000 results for a user-driven query would be too much anyway.

If you have code that does the search, then you can set the search parameters:
   org.alfresco.service.cmr.search.SearchParameters
and set the limit and limitBy properties on a per-query basis.  You can also bypass security checks in your code by running as the system user or using the searchService instead of the SearchService.

Leave the default for user-driven searches as it is to prevent users from overloading the system with queries for thousands of documents.

epaoletti
Champ in-the-making
Champ in-the-making
Thanks for your fast answer,

Now the full-text search philosophy is more clear to me.

Basically, I can never know the total number of contents(documents) in alfresco that correspond to a query criteria.
Is it Correct ?

This could be a problem for the end-user.
Have you some suggestions in order to provide end-user with it ?

Alfresco could be a good framework in order to develop vertical application but often you must know  before
hown many objects you must manage (for instace to show it on the web interface).

Are there some alfresco internal counter variables containing the total number of contents(documents) in one "Space" ? 

Thanks in advance

derek
Star Contributor
Star Contributor
Hi,
We don't currently have space-related quotas built into the system.  We're laying some groundwork in 3.2 that will make these types of calculations more efficient.
Regards

dwilson
Champ in-the-making
Champ in-the-making
Hi,
We don't currently have space-related quotas built into the system.  We're laying some groundwork in 3.2 that will make these types of calculations more efficient.
Regards

Derek, as it is now, what is the fastest way to find the count of the number of nodes for a query?  For example, the number of nodes in a particular category or parent category? 

Is there a way to do this, in say a webscript, without Lucene?   (The total number might be larger than maxPermissionChecks)

I see in this post they talk of storing totals in another node, updated daily by workflow- which seems pretty kludgy.  http://forums.alfresco.com/en/viewtopic.php?f=4&t=3677&p=11875&hilit=total+count#p11875

Thanks!

derek
Star Contributor
Star Contributor
Derek, as it is now, what is the fastest way to find the count of the number of nodes for a query? For example, the number of nodes in a particular category or parent category?
dwilson,
The only way to efficiently find nodes in a category is by using a Lucene search, I'm afraid.

dwilson
Champ in-the-making
Champ in-the-making
Derek, as it is now, what is the fastest way to find the count of the number of nodes for a query? For example, the number of nodes in a particular category or parent category?
dwilson,
The only way to efficiently find nodes in a category is by using a Lucene search, I'm afraid.
That is unfortunate, so in order to display this kind of information on our website, the only option without taking forever is to cache the totals?

e.g.:

Pet Category Links:  (Totals)
  • Dogs (8,123)
  • Cats (12,328)
  • Birds (7,230)
  • Rabbits (418)
  • Hamsters (3,132)
  • … etc.

vta
Champ in-the-making
Champ in-the-making
Hi.

> You can also bypass security checks in your code by running as the system …

It's possible using web services?

Thanks
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.