cancel
Showing results for 
Search instead for 
Did you mean: 

contentstore: unnecessary file duplication

qasimh
Champ in-the-making
Champ in-the-making
Hi,

I'm using Alfresco 2.0 CE, on Windows Server 2003, Tomcat, and MS SQL Server.

As I was debugging something, I noticed that the files in the contentstore folder do not get cleaned up after a modification.

Using the NodeBrowser to find the contentURL and examining alf_data/contentstore, I see that files get duplicated in the backend after every modification.

For example, FileA.htm without versioning:

FileA.htm's content URL: 
contentUrl=store://2007/5/4/13/22/7b78124f-fa6c-11db-9240-eba5cf10755f.bin|mimetype=text/html|size=14261|encoding=UTF-8|locale=en_US_

FileA.htm's content URL, after a modification
contentUrl=store://2007/6/12/11/13/e3154088-18ff-11dc-8092-3142445812fe.bin|mimetype=text/html|size=14267|encoding=UTF-8|locale=en_US_

FileA.htm's content URL, after another modification:
contentUrl=store://2007/6/12/11/17/5f24bdd7-1900-11dc-8092-3142445812fe.bin|mimetype=text/html|size=14276|encoding=UTF-8|locale=en_US_

Each binary file above in the contentstore remains there after the modification.  This seems like a serious disk space consumer for no apparent reason.

Is this a bug, or is there more going on in the back that I'm not aware of?
6 REPLIES 6

amar
Champ in-the-making
Champ in-the-making
Did you have an explanation for this problem.
I noticed the same pb (without versionning)
Amar

qasimh
Champ in-the-making
Champ in-the-making
Did you have an explanation for this problem.
I noticed the same pb (without versionning)
Amar

Nope… no responses, no advice, no followups  Smiley Sad

ero
Champ in-the-making
Champ in-the-making
Hi,

IMHO, that's an Alfresco implementation choice to never modify any content file stored on a content store but creates systematically a new content copy, the latter being updated and associated with current metadata (old file content becoming de facto an orphan content if not any versionning has been defined).
It guarantees you that Alfresco will never implicitly delete any "valid" content.

To keep things clean, if I'm not wrong (to be confirmed), there is a scheduled cleaner job which will detect (in other things) any obsolete (orphan) file content and will delete it automatically.

Hope it helps you.

qasimh
Champ in-the-making
Champ in-the-making
Thank you ERo. 

Do you know anything more about this scheduled job that's supposed to take care of these orphaned nodes/content?

We deployed Alfresco about 6 months back and the alf_data repository is currently 4GB large.  I know it's not much now, but we only have about 1.0-1.5 GB of valid content in there (very little of it is auto-versioned). 

For the first time since the deployment, I permananently removed old deleted content (from the web-ui)… about 1-1.5K files (total size > hundreds of MBs).  However, the alf_data folder size has not changed at all.

As things are now, it all seems terribly inefficient and inpractical.  As this content continues to grow, disk ineffeciencies are bound to creep up.

Any insight on this matter will be greatly appreciated …. uhm uhm (Alfresco Engineers).
-Q

qasimh
Champ in-the-making
Champ in-the-making
I found the scheduler that is supposed to clean the content store:
WEB-INF/classes/alfresco/scheduled-jobs-context.xml

I see lots of beans that might help clean up things…. I I'll start looking into this .

-Q

ero
Champ in-the-making
Champ in-the-making
Hi Q,

That job is absolutely necessary to keep clean Alfresco repository (e.g. CIFS protocol being not transactionnal, if you are uploading some content and your network connexion goes down during upload process, the repository will become inconsistent (you will see metadata in the Web UI but you will be unable to access any content)).

You are right, scheduled-jobs-context.xml file is the file you can customize.
For the rest, you will find all cleaner job related Java code in :
HEAD\root\projects\repository\source\java\org\alfresco\repo\content\cleanup
and all file content store java code (content deletion implementation)  in :
HEAD\root\projects\repository\source\java\org\alfresco\repo\content\filestore
(FileContentStore.java)

Regards,

ERo
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.