Performance problems after a while that is not used

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-20-2014 03:52 AM
I've got a strange behavior in terms of performance in a webscript that consists in a Lucene query with some condition (by path, by date, by aspect and sometimes by some imap aspect metadata).
The webscript is very slow when it is called after a while that is not used but is very very fast if it is called many times in a short range of time; it not seems to be related to the cache, because if i completely change the conditions (so the results will be a completely different) it remains extremely fast.
We are on a old 3.3 alfresco version and the webscript is written in javascript and it is executed always with the same user (that is not an admin).
I can't figure out which components could cause this behavior (eg: there is a loading phase of a webscript?)
Someone have an some ideas?
- Labels:
-
Archive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-20-2014 05:01 AM
a couple of reasons can influence the performance and there are multiple levels of caches to consider here, not just the Alfresco node and properties caches. E.g. Lucene can and will cache some index data necessary for performing the query. The database can and will cache some table data that even affects performance of queries with completely different results. And the OS may at a low level cache file system contents that Lucene or the DB use to execute queries when they have not cached data in memory.
What kind of magnitude of performance difference do you observe? How many results are we talking about here? What exactly does the Lucene query look like (keep in mind, PATH queries can be veerryyy expensive)?
Regards
Axel

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-20-2014 06:07 AM
The difference is huge: 80 seconds for slow response, 2 seconds for fast response.
<blockquote>How many results are we talking about here?</blockquote>
Another thing that i forgot to specify in my post is that seems there is no relation between performance and number of results. Anyway we are talking of a maximum of 2000 results.
<blockquote>What exactly does the Lucene query look like (keep in mind, PATH queries can be veerryyy expensive)?</blockquote>
We have some files with the IMAP aspect, and we filter on metadatas as sentDate, sender, subject. All the files are organized in subspaces under a common spaces, so we introduce the PATH condition to limit the search, but in the path we need to use some wildcart to be able to go in subspaces. Could this condition affect the performance? If yes, I can remove it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-21-2014 04:30 AM
Does the webscript do anything else with the results of the Lucene query apart from generating the response via FTL? Can you insert timing measurements into the code of the web script to determine at what point in the execution what amount of time has elapsed?
Regards
Axel

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-21-2014 08:38 AM
The webscript is very simple: based on the URL parameters it build a query string through a simple string concatenation; then there is a call to the search API
search.query(def);
where def is the object containing the search confinguration (query, language, sort…).The result of the query will be assigned as variable in the model and passed to FTL to build the JSON response.
So there is no other elaboration except the query call.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2014 06:18 AM
I've put some timing measurements in the code (query building, before the call to search API, after the call to search API, at the end of the script) and
I noticed also that the clause is ignored if MaxItems use the admin user: there is no way to avoid this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2014 11:11 PM
Searching can use a lot of memory for a large index,. I guess if your JVM is not running with a large enough HEAP size then the JVM will pays the price of initializing caches at your first query at some time.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2014 03:45 AM
<blockquote>if your JVM is not running with a large enough HEAP size then the JVM will pays the price of initializing caches</blockquote>
The JVM has a very large HEAP size, the server is very powerful.
But I've notice that, often the time of execution of the query seems low, but the time of response is high. There is a caching mechanism for the template?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2014 04:09 AM
Where did you place your index on? keep the index on local disk will improve performence.
How much memory did you leave for operating system ? It seems os also needs some momory to cache index files.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2014 08:33 AM
OK, but after some investigating, it seems that is not the query the problem.
for example:
<ol>
<li>First request: total time 30 sec, query time 2 sec</li>
<li>Second query (the same): total time 2,5 sec, query time 2 sec</li>
<li>Third query (completely different resultset): total time 2,5 sec, query time 2 sec</li>
</ol>
And in my cluster, I verified this behaviour in every single node.
