12-13-2017 05:36 AM
Hi,
we use alfresco-solr4-5.2.e with Alfresco 5.2.e and we found potential inefficiency.
Solr tracks changes in an associated Alfresco system by periodically requesting info about committed transactions. The tracking is triggered every 15 seconds (property alfresco.cron=0/15 * * * * ? *) and every time it requests transactions back to the time of the last committed transaction (actually hour before that -> alfresco.hole.retention=3600000).
One tracking sends many requests. One for each hour between now and time of the last committed transaction.
And we found out that all hour intervals since the last committed transaction are queried over and over again every 15 seconds. For example, when I upload a file to the Alfresco and then wait few hours, the number of tracking requests will grow by one for each hour since the upload. And the same requests will be fired every 15 seconds until I upload another file.
And we wonder, why it need to query the same interval more than once? Even when the interval is in the past. Is it inefficiency, or is there some reason behind that?
The problem is partially mentioned here, but there is nothing about the repeated querying of the same time interval.
Some insight would be appreciated. Thank you.
12-13-2017 07:25 AM
There is no state management of "when" SOLR has last queried for changes. SOLR only checks based on the last transaction it has found in the index and uses that transaction's commit time as the basis for the interval. So in those cases where nothing has been done in the system, that information is simply lacking.
Is it inefficient? Yes. Has something changed or is something going to be changed? No, and it's not very likely. Alfresco has never been designed or optimised to be an idle system without any user load for long durations of time, and you would require an idle system for this to even manifest itself. On the other hand - apart from spamming the access logs - these additional requests should be negligible in effective cost to the system. The DB query simply yields no result and the request is done in a hand full of milliseconds.
Feel free to file an issue in the Alfresco JIRA to log this as a bug. Any discussion here in this platform does not automatically lead to such topics being tracked as something to be fixed...
12-13-2017 07:25 AM
There is no state management of "when" SOLR has last queried for changes. SOLR only checks based on the last transaction it has found in the index and uses that transaction's commit time as the basis for the interval. So in those cases where nothing has been done in the system, that information is simply lacking.
Is it inefficient? Yes. Has something changed or is something going to be changed? No, and it's not very likely. Alfresco has never been designed or optimised to be an idle system without any user load for long durations of time, and you would require an idle system for this to even manifest itself. On the other hand - apart from spamming the access logs - these additional requests should be negligible in effective cost to the system. The DB query simply yields no result and the request is done in a hand full of milliseconds.
Feel free to file an issue in the Alfresco JIRA to log this as a bug. Any discussion here in this platform does not automatically lead to such topics being tracked as something to be fixed...
12-15-2017 06:31 AM
Thank you for your answer.
01-26-2018 03:15 AM
Well, thank you for your explanation, although I don't still fully get your point (see below). So before I file a bug, I would like to ask you (or anyone else) here, maybe I could overlook something... This topic is clearly going around for many years (since 2012, at minimum, see SOLR causes high CPU usage on idle repo. ), but no one is actually doing anything about it. I personally don't find answers like "disable your access log" or "just upload to your Alfresco something at least once in a day or two" as real solutions.
So, my question is whether the Solr implementation can be really considered as a sane one, provided that there are the following observations:
01-27-2018 06:27 AM
"No one is actually doing anything about it" - For a long time the contribution process was so cumbersome / ineffective that only Alfresco engineers could have been doing anything about it, and for them it did not end up being a top priority. In most production environments this has not bee a relevant issue / topiic, so customers apparently did not report this sufficiently often enough for it to become a priority. 140 M of highly compressable log file can be dealt with easily with logrotate. And if you really wanted you could separate SOLR tracking requests from others before rolling over and compressing logs.
Maybe or could comment on this (Andy also participated in that old forum thread you linked back on the old forum platform).
Explore our Alfresco products with the links below. Use labels to filter content by product module.