cancel
Showing results for 
Search instead for 
Did you mean: 

Deleting a large number of documents safely

shanahjr
Champ in-the-making
Champ in-the-making

Hi Everyone,

 
I have a case where I have about 1.2 million documents in a single folder on Alfresco. My aim was to delete all these documents on alfresco. Then locally I would organize the documents by date modified, and subsequently split them into 1000 document sub directories. Then I would reimport them onto alfresco.

I am not able to do this from the front end because alfresco crashes, I'm assuming this is due to the sheer number of documents in a single directory. So I used the API to delete the documents one by one. This was working until I realised that the alf_node_properties table in the database kept increasing in size. It went from 79 million rows to mid 80's.

What I want to know is what do I do about the alf_node_properties table? I was expecting it to reduce in size instead of increasing.

Please note that I used the permament flag true when deleting the documents to ensure it does not go to the trash.

Additionally, I have deleted a couple hundred thousand documents. Why isnt the storage usage in my server going down?

Is there a better way to permamently delete documents in alfresco? Please note this is the community edition 5.2
2 REPLIES 2

fedorow
Elite Collaborator
Elite Collaborator

As I understand you want to export 1.2 million files from one folder one by one outside Alfresco, then delete 1.2 millions source documents in this folder and then import it back.

If you get nodes one by one from a large folder do not export it, just move these nodes into propper new folders structure. I did it for the folders with more then 2 millions nodes by JavaScrip API. It was not efficient. Maybe the REST API will make it better. In the moving case you will do only one operation with each node in that large folder.

Answer to your question. Deletion in Alfresco has about 5 steps. The last step does not even resolved: alfresco never delete bin files from the file system, just move it in the contentstore.deleted.  Read more about Alfresco deletion process in the article Understand the Lifecycle of Alfresco Nodes.  But again, I do not think you should delete documents from the repository at all.

4535992
Star Collaborator
Star Collaborator

Some other good reference:

https://blyx.com/2014/08/18/understanding-alfresco-content-deletion/

https://alfrescoshare.wordpress.com/2009/12/17/understanding-alfresco-document-life-cycle-for-backup...

There is also this utility project i used in some occasion, you have to fix the code for version 23 of alfresco slightly, but it is still good as a principle line.

https://github.com/keensoft/alfresco-deleted-content-store-cleaner

Or better yet from personal experience you can create a java code to split the contents of the folder into N folders each containing at most1000 nodes in this way with lucene queries you can retrieve data browse documents without particular problems