cancel
Showing results for 
Search instead for 
Did you mean: 

Potential tracking inefficiency by the Alfresco-Solr4

j_fintora
Champ in-the-making
Champ in-the-making

Hi,

we use alfresco-solr4-5.2.e with Alfresco 5.2.e and we found potential inefficiency.

Solr tracks changes in an associated Alfresco system by periodically requesting info about committed transactions. The tracking is triggered every 15 seconds (property alfresco.cron=0/15 * * * * ? *) and every time it requests transactions back to the time of the last committed transaction (actually hour before that -> alfresco.hole.retention=3600000).

One tracking sends many requests. One for each hour between now and time of the last committed transaction.

And we found out that all hour intervals since the last committed transaction are queried over and over again every 15 seconds. For example, when I upload a file to the Alfresco and then wait few hours, the number of tracking requests will grow by one for each hour since the upload. And the same requests will be fired every 15 seconds until I upload another file.

And we wonder, why it need to query the same interval more than once? Even when the interval is in the past. Is it inefficiency, or is there some reason behind that?

The problem is partially mentioned here, but there is nothing about the repeated querying of the same time interval.

Some insight would be appreciated. Thank you.

1 ACCEPTED ANSWER

afaust
Legendary Innovator
Legendary Innovator

There is no state management of "when" SOLR has last queried for changes. SOLR only checks based on the last transaction it has found in the index and uses that transaction's commit time as the basis for the interval. So in those cases where nothing has been done in the system, that information is simply lacking.

Is it inefficient? Yes. Has something changed or is something going to be changed? No, and it's not very likely. Alfresco has never been designed or optimised to be an idle system without any user load for long durations of time, and you would require an idle system for this to even manifest itself. On the other hand - apart from spamming the access logs - these additional requests should be negligible in effective cost to the system. The DB query simply yields no result and the request is done in a hand full of milliseconds.

Feel free to file an issue in the Alfresco JIRA to log this as a bug. Any discussion here in this platform does not automatically lead to such topics being tracked as something to be fixed...

View answer in original post

11 REPLIES 11

afaust
Legendary Innovator
Legendary Innovator

I don't know of any differences / conscious changes in SOLR 6 / ASS architecture compared to the old SOLR 4 that would have changed this behaviour. In the default configuration it should actually track / generate queries even more frequently than SOLR 4 (CRON set to every 10 seconds instead of every 15 seconds). There may be some coincidental changes regarding caching of some secondary data that might alleviate the symptons a bit, but the core queries should be unchanged... I'll have to take a closer look when I get time.

Hi Alex,

I checked my Solr6 installation again and you are right, you only underestimated the performance impact:

Solr6 still insanely queries for all the non-existing transactions (i. e. from the last known transaction time until now), but due a "caching" mechanism (which seems like not a real caching, but rather simply "don't query already queried time", with last time kept in memory) it does this only when it gets started.

After this startup, it queries just for the few latest (although logically still non-existent) transactions - and this small difference from Alfresco Solr4 has a tremendous positive performance impact on the Alfresco system, depending on how often you upload something to your Alfresco, of course.

UPDATE: So I have created ALF-22094 today. Hopefully some competent author will look at it and fix it, or at least (s)he will give some technical explanation, although I can't imagine how this issue could be really advocated.

Regards

Petr