Deleting a large number of documents safely

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2024 01:29 PM
Hi Everyone,
I am not able to do this from the front end because alfresco crashes, I'm assuming this is due to the sheer number of documents in a single directory. So I used the API to delete the documents one by one. This was working until I realised that the alf_node_properties table in the database kept increasing in size. It went from 79 million rows to mid 80's.
What I want to know is what do I do about the alf_node_properties table? I was expecting it to reduce in size instead of increasing.
Please note that I used the permament flag true when deleting the documents to ensure it does not go to the trash.
Additionally, I have deleted a couple hundred thousand documents. Why isnt the storage usage in my server going down?
Is there a better way to permamently delete documents in alfresco? Please note this is the community edition 5.2
- Labels:
-
Alfresco Content Services
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2024 03:00 PM
As I understand you want to export 1.2 million files from one folder one by one outside Alfresco, then delete 1.2 millions source documents in this folder and then import it back.
If you get nodes one by one from a large folder do not export it, just move these nodes into propper new folders structure. I did it for the folders with more then 2 millions nodes by JavaScrip API. It was not efficient. Maybe the REST API will make it better. In the moving case you will do only one operation with each node in that large folder.
Answer to your question. Deletion in Alfresco has about 5 steps. The last step does not even resolved: alfresco never delete bin files from the file system, just move it in the contentstore.deleted. Read more about Alfresco deletion process in the article Understand the Lifecycle of Alfresco Nodes. But again, I do not think you should delete documents from the repository at all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-12-2024 09:09 AM
Some other good reference:
https://blyx.com/2014/08/18/understanding-alfresco-content-deletion/
There is also this utility project i used in some occasion, you have to fix the code for version 23 of alfresco slightly, but it is still good as a principle line.
https://github.com/keensoft/alfresco-deleted-content-store-cleaner
Or better yet from personal experience you can create a java code to split the contents of the folder into N folders each containing at most1000 nodes in this way with lucene queries you can retrieve data browse documents without particular problems
