Hyland Connect

luca · ‎03-15-2017

Hi All,

I have some problems because Alfresco is taking too much to startup, actually it is taking 4 hours to complete.

I'm using Alfresco CE 4.2.d and we currently have approximately 700K documents that takes 600GB in total.

Can you help me why Alfresco is taking so much during startup?

Monitoring the machine I don't see any critical issue on memory, CPU or disk access on the startup time.

We are not using Solr, maybe this will have better performance?

I have attached the startup log.

luca · ‎05-05-2017

Great news, I succesfully bring back the startup times to 21 min thanks to a tiny modifications to a single query!

The query is the one that I pointed out in a previous post and the complete explanation of the problem is described here: https://issues.alfresco.com/jira/browse/MNT-15576

Thank to all who helped me!

View answer in original post

mehe · ‎03-15-2017

Hi Luca,

are you monitoring memory consumption on your system? But I don't know why alfresco tries to cache all the nodes...

We are using solr and are not encountering such boot times - normally alfresco boots within 5 minutes (about 1.000.000 docs). But if you depend heavily on transactional consistency on your index, you would have to brush up your code - solr works with eventual consistency, wich means simplified "if you do something, you won't find it immediately in your index, but later everything is fine".

My repository needs about 16GB of RAM for that amount of content, but solr also consumes a lot.

Have you tried to grant more memory to the JVM (and left enough for the OS)?

Is there something left in alfresco-global.properties that causes a full reindex on startup?

Is alfresco slow after the startup?

wishing the best for you

luca · ‎03-15-2017

Hi Martin,

I have added a report that shows you that the the machine is ok, as far as I see.

That machine has 16GB of RAM and Alfresco is using 6GB of it.

The index recovery is set to index.recovery.mode=AUTO, so there shouldn't be a reindex at startup.

After startup Alfresco is not slow, I noticed only a problem when I try to configure an advanced workflow when I search for groups the service goes in timeout (log: WARN [org.alfresco.authorityTransactionalCache] [ajp-apr-8010-exec-1] Transactional update cache 'org.alfresco.authorityTransactionalCache' is full (10000).).

Could this be culprit of cache?

I tried to double the cache of nodes setting this:

cache.node.nodesSharedCache.tx.maxItems=250000

cache.node.nodesSharedCache.maxItems=2500000

But Alfresco became 30min slower 😞

mehe · ‎03-15-2017

"Cache is full" is only a warning - so it's not the cause of your problem, but another symptom. Have you tried to decrease the cache size? If you say Alfresco is using 6GB, is it really using them or did you configure 6GB for the JVM?

Try to configure 8GB JVM memory then.

But everything smells like an Index problem. How long would it take to rebuild the complete index?

Personally I would switch to Solr in any case...

have to take some questions back, didn't realize the attached report on mobile...

afaust · ‎03-15-2017

Luca _ wrote:
The index recovery is set to index.recovery.mode=AUTO, so there shouldn't be a reindex at startup.

Actually, the AUTO recovery mode will perform a FULL re-index if it finds that the index does not have a valid index when that index does not even properly contain the initial transactions.

Regarding the caches: having transactional caches reported as full is not a catastrophic problem but it can have very serious problems to system performance.

Also, the correct solution for this would NOT be to increase the cache sizes (that kind of lazy suggestion I see all the time), but to identify the operation that is causing that much data to be loaded into caches in the first place. No default operation in Alfresco out-of-the-box should overwhelm the caches provided your data structure is sane (e.g. no overly excessive use of groups or secondary child associations) and you don't do queries for insanely large amounts of data in jobs / actions / any other kind of operation within a single transaction.

Setting nodesSharedCache to 2.5 million is a bold move, especially with the Alfresco default caches. I hope you haven't also increased the nodeAspectsShareCache and/or nodePropertiesSharedCache... This would dramatically increase the amount of heap used just for caching and can choke other processes in terms of "working memory".

Maybe you could provide your entire alfresco-global.properties configuration (anonymise sensitive data!) and list what kind of 3rd party addons / customisations you have applied to the system.

luca · ‎03-16-2017

Hi Axel,

yesterday I restored the default cache configuration and give more memory to Alfresco raising it to 8GB, but the result is that this morning Alfresco took 4h 40min to startup, another 20min more than yesterday!! I didn't mention that Alfresco is shutten down at 2:55AM every night and restarted at 3:05AM using shutdown.sh and startup.sh scripts, could be that the indexes are not closed correctly? How can I check it?

I have added only the alfresco-trashcan-cleaner addon to empty the trashcan, but it is scheduled to run in the evening and an authentication filter based on shibbolet.

Also we have a lot of groups (49457 authorityContainer in DB, don't know if they are really groups) and users (about 13K).

I have attached also the alfresco-global.properties

afaust · ‎03-16-2017

If you have a lot of sites then you can end up with quite a number of groups, even if you effectively don't use all/most of them. Each site will create 5 groups.

The alfresco-global.properties does not show anything unusual so far - some configuration can be in other alfresco-global.properties though, but only if you have custom/3rd party modules installed which you say you only have the trashcan cleaner.

You can do several things to find out what is happening during those hours of startup:

increase log levels to debug, though this might potentially cause the log file to grow extremely fast
use jstack to capture thread dumps of the startup process (may be automated to run every couple of seconds to have multiple dumps / snapshot views)
run profiling agents that may record what operations are being executed and how long they take

Unless you somehow delete / tough the index during the downtime each night there should normally be no reason for the index to be corrupt and have to be rebuilt from scratch on every restart. I am not aware of any issue that would leave the index corrupted when you properly shut down Tomcat - only if you resort to SIGKILL might there be some issues.

mehe · ‎03-16-2017

To identify the root-problem of your slow startup:

- did Alfresco take a long time for startup since the beginning (after containing a reasonable count of docs) ?

- if not, what did you change in your configuration that may have caused the issue?

- was there a kind of "mass changing" meta data or access rights or moving around large amounts of documents, resulting in huge transactions?

Another option: For offline analysing a copy of your lucene index, you can use Luke (getopft.org/luke).

luca · ‎03-16-2017

Hi Martin,

I added some other details in response to Axel.

This slow startup is an old problem that we ignored because if was restarting in the night when noone was working, but now I have some time to take a look at it. As far as I know there is not any action that I made that caused it suddendly, but it raised constantly.

For example these are some startup times that I registered:

24/08/2016 - 1h 50min
14/03/2017 - 3h 42min
15/03/2017 - 4h 19min - restart with nodesSharedCache raised
16/03/2017 - 4h 35min - restart with default cache and 8GB of mem

I will try to inspect indexes, the /opt/alf_data/lucene-indexes/archive/Spacestore folder is 497MB big, instead /opt/alf_data/lucene-indexes/workspace/spaceStore is 5GB big.

What I have to analize?

luca · ‎03-16-2017

I have just see that the DB is heavily used in the startup period, I will take a look deeper to find what it is happening.

Hyland Connect

Afresco startup too slow