06-01-2020 02:23 PM
I am testing the Enterprise content manager 6.x. I have a usecase in which I have to upload 120 million documents to Alfresco. Alfresco will be used mainly as content store (i.e. without any seach functionality that Alfresco provides). There is an application which will call Alfresco to get a document based on a document identifier. At this point, I have created a folder in Repository. Using Java batch, I am uploading the documents along with their meta data to this folder (for testing, I have uploaded around 500k documents). The loading goes thru without any issues. However when accessing this folder the application freezes / crashes. From the forums, it looks like, we should not create more than 10,000 documents in a folder. https://hub.alfresco.com/t5/alfresco-content-services-forum/large-folders-in-the-content-repository/...
Based on this link, I am thinking of creating folder for a date and then about 1000 subfolders (For certain dates I have 1 million documents. With 1000 subfolders, I can limit the numbers to less than 10,000 per folder). The structure of the folders will be as follows :
<Parent Folder: Application XYZ>
<Folder-2020-06-01>
<sub folder 2020-06-01 -0001>
<sub folder 2020-06-01 -0002>
<sub folder 2020-06-01 -0003>
I have few questions :
1 - Is there a way to automatically create this folder structure in Aflresco? At the start of the day, I want to create this folder structure automatically. I would like to avoid writing any custom routine / batch to create these folders.
2 - I am planning to load the document in main parent folder. The document will have a meta data for create date. Can I apply a rule which will move the document to the correct folder based on the create date of the document and last three digits of another meta data - document number.
3. I need Alfresco to generate a document number and associate it with the document that is being saved. It looks like there is a module available https://hub.alfresco.com/t5/alfresco-content-services-add/alfresco-numbering-redpill/td-p/289334
I am not sure if it is supports by Alfresco 6.x Has this been used in any implmentation? If yes, can you let me know the steps? Also let me know if there is any other alternative for this functonality.
06-03-2020 05:24 PM
What you read in the support forums is correct. You should limit the number of documents/subfolders that are created within a single folder in order to optimize the performance of your ACS system.
At TSG (now an Alfresco company), we have performed several large migrations to ACS and have created folder structures in ACS based on metadata as part of the migration in order to distribute documents evenly. I'm not sure what tool you're using to load documents into Alfresco, but for our customers, we use OpenMigrate (https://www.tsgrp.com/products/openmigrate/). It's able to automatically create the folder structure during the migration.
In response to your questions:
1. I would recommend using a tool like OpenMigrate that is able to automatically create folders if they do not exist when loading content into Alfresco.
2. Similar to my response to your first question, I would recommend that for loading content into Alfresco, you use a tool that is able to automatically create a folder structure during import based on metadata that is applied to the document. OpenMigrate is able to create a folder structure based on created date and substrings of other metadata fields per your requirements.
3. For autonumbering, I haven't used the module from Redpill that you've linked to, but it sounds like it's able to do what you're looking for. Technology Services Group (now an Alfresco company) also has a module that can be deployed to enable auto numbering. Another option that I've seen other customers use is to create a behavior that assigns a number based on the sys:node-dbid that's generated by ACS. The advantage of this approach is that it's very performant for high-volume systems and does not require an external sequence or any additional database locking to be done in order to ensure that each document receives a unique number.
06-03-2020 12:33 AM
Hi,
You can create rule and execute your own script on that rule ,such that it will create folder/sub-folder base on date and move document on that created filder.
Write a script if folder is exist then move document otherwise create new folder.
If it is base on create date then don't create all folder at a time.
06-08-2020 12:53 PM
Thanks for the direction. This worked for me.
06-17-2020 07:05 AM
Hi @Bhatia_ravi,
Thanks for accepting the solution - really useful to other users to know what works. Thanks @parzgnat for providing the solution!
Take care all,
06-03-2020 05:24 PM
What you read in the support forums is correct. You should limit the number of documents/subfolders that are created within a single folder in order to optimize the performance of your ACS system.
At TSG (now an Alfresco company), we have performed several large migrations to ACS and have created folder structures in ACS based on metadata as part of the migration in order to distribute documents evenly. I'm not sure what tool you're using to load documents into Alfresco, but for our customers, we use OpenMigrate (https://www.tsgrp.com/products/openmigrate/). It's able to automatically create the folder structure during the migration.
In response to your questions:
1. I would recommend using a tool like OpenMigrate that is able to automatically create folders if they do not exist when loading content into Alfresco.
2. Similar to my response to your first question, I would recommend that for loading content into Alfresco, you use a tool that is able to automatically create a folder structure during import based on metadata that is applied to the document. OpenMigrate is able to create a folder structure based on created date and substrings of other metadata fields per your requirements.
3. For autonumbering, I haven't used the module from Redpill that you've linked to, but it sounds like it's able to do what you're looking for. Technology Services Group (now an Alfresco company) also has a module that can be deployed to enable auto numbering. Another option that I've seen other customers use is to create a behavior that assigns a number based on the sys:node-dbid that's generated by ACS. The advantage of this approach is that it's very performant for high-volume systems and does not require an external sequence or any additional database locking to be done in order to ensure that each document receives a unique number.
06-17-2020 06:22 AM
Thanks for some interesting features that I haven't know before!
Explore our Alfresco products with the links below. Use labels to filter content by product module.