cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco Community 23 with strange behavior

inzano
Confirmed Champ
Confirmed Champ

We have an ACE 5.0. In this repository  we have a folder with around 400,000 children (subfolders/documents). If we upload something else to that folder everything works fine.

So, we upgraded that ACE to the last version (25.x), first upgraded from 5 to 6.0, then to 7.4, then to 25. Now when we upload a document to that folder Alfresco does not return a response, it stays doing something until the client gets a timeout (tested with CMIS and Rest API).

Debugging Alfresco we saw that after creating the new document (inserting the node ans its properties), Alfresco executes a query to retrieve all the children in the parent folder and since that folder has more than 400,000 children that takes some time. After that Alfresco also sends a lot of queries for all the children and that's why never sends a response.

If you upload a document to an empty folder or with few documents that behavior is not noticeable.

We obviously did a backup and started to upgrade and test the upload to the same folder in every version:

- Version 5.0 works fine

- Version 6.0 works fine

- Version 7.4 works fine

- Version 23.1 has this behavior

- Version 25.1 has this behavior

 

Any clue of why this is happening?

3 REPLIES 3

fedorow
Elite Collaborator
Elite Collaborator

I don’t know exactly why this happens. I haven’t seen any official information, but it’s generally recommended to have a maximum of about 2,000 to 4,000 items per folder. In my experience, all interfaces work perfectly with around 1,000 items, and I definitely don’t recommend having more than 10,000 items in a single folder.

So, find a classification system that works for you and organize the 400,000 documents into a subfolder structure.

Overwerkt
Champ on-the-rise
Champ on-the-rise

We had a similar scenario with our on-premises Alfresco 7.2.x installation. 

A user who didn't want to have to manage a lot of folders just wanted one root folder, named for the current fiscal year. She didn't want any subfolders and wanted all of her files to be uploaded into this root folder, after which the RM lifecycle would kick in (auto-declare and file, retention schedule policies applied, auto-complete).  The projected volume of records (electronic files) for this single root folder was about 20,000/year. I didn't like this idea at all; I was concerned about slow performance.  However, the user made it a firm business requirement and repeatedly stated that she and her team wouldn't need to browse for files (they wouldn't be opening the folder); they would only use Search to search for them. 

After this was implemented, Alfresco performance slowed all across the repository.  Users habitually tried to open the folder and browse for files, despite what they said previously about only using Search.  The next fiscal year, we changed how this worked.  We now auto-create daily folders that constrain the number of files uploaded to about 1,000.  The auto-created folders contain the fiscal year and a date and time stamp for when the folder was created. The slow performance issue is now resolved.  The number of folders you have and how many files you upload into each of them matters.

Before we made this change, errors in the System Tail Log (available in the Admin Console) included  "transaction cache is full" and the upload batches were exceeding the total upload file size allowed.  The default total size limit for uploading a batch of files is 2 GB.  Uploads will choke and just sit there if the file size limit is reached. If this is an issue, you can increase the size limit or remove it. 

The specific error about the transaction cache was:  [org.alfresco.repo.cache.TransactionalCache.org.alfresco.cache.propertyValueTransactionalCache] Transactional update cache 'org.alfresco.cache.propertyValueTransactionalCache' is full (1000).]  We increased the transaction cache to 2000. 

Many people forget or don't realize that if you upload large numbers of files to a single folder, so that you exceed 5,000 files in the folder, there is a SharePoint Online list view threshold of 5,000 items.  Alfresco uses SharePoint protocols, so this limit of listing only 5,000 items in a folder, even if that folder contains 10,000 items, does affect Alfresco users.  The 5,000-item limit is a resource throttling feature to prevent performance issues when accessing large lists, which can strain server resources.  So there are multiple reasons for having a guideline of uploading no more than 1,000 files per folder.  Fedorow said this much more succinctly than me.  He is right and we have experienced a use case that illustrates how important the file limit is for folders and how important it is to plan and layout a workable folder structure before you start uploading files to Alfresco.  Work with your business users, business analysts, and project managers on this important step -- creating a folder structure that helps users find what they need quickly and easily, but also avoids slow performance issues in Alfresco.

We also keep an eye on the amount of memory that is on the server and how much of that is allocated to Alfresco as our digital records management implementation grows.  We have periodically increased the amount of memory in our production environment over time.

LeoMattioli
Employee
Employee

Would be nice to see the Thread Dumps to intercept what the Repo is trying to propagate to all the siblings/children. One thing you can easily try is to disable event2 generation. Add this config to alfresco-global.properties

repo.event2.enabled=false

Here's the commit that added that config: https://github.com/Alfresco/alfresco-community-repo/pull/1703

Be aware that "little" configuration can impact other services, here's a more detailed post https://connect.hyland.com/t5/alfresco-blog/using-activemq-with-alfresco-7-4/ba-p/125096 

Hope this helps.


Leo Mattioli - Technical Account Manager @Hyland.