cancel
Showing results for 
Search instead for 
Did you mean: 

storage engine and checksum of content

fluca1978
Champ in-the-making
Champ in-the-making
Hi all,
I'm just curious if Alfresco has already a way to guarantee that what you put into the repository is the same thing you can get out, even in the presence of disk corruption. I mean, is there an hashing/checksum on the content placed into the repository?

Thanks.
10 REPLIES 10

llemtt
Champ in-the-making
Champ in-the-making
Good question

I am wondering that too!

Actually I am planning a script to calculate crc and store into metadata, later on I can use it to check content integrity again.

mrogers
Star Contributor
Star Contributor
The content size is already there, could you use size instead?

fluca1978
Champ in-the-making
Champ in-the-making
It is not clear what you mean, but using the size attribute to do a consistency check sounds silly to me.

andy
Champ on-the-rise
Champ on-the-rise
Hi

There is no way to do this.
It sounds like a handy enhencement to me.
I suggest you raise an enhancement.

Andy

cpaul
Champ on-the-rise
Champ on-the-rise
Hi all,

I just wrote a blog post that walks through adding a custom aspect to generate checksums for Alfresco content here:

http://www.productivist.com/2011/11/21/generate-checksums-for-alfresco-content/

Feel free to check it out and let me know if you have thoughts or questions.

Chris

everbehere
Champ in-the-making
Champ in-the-making
I want to use the facility of checksums and reject the documents which have the same content. But I am not able to go to the right path. It will be quite helpful if you or someone can provide me the solution.

We actually have a similar but different requirement.  We have many cases where the same document is uploaded many times (really an artifact of a poor document object model in the business process) and we only store the same document once and use associations to link the content with the meta data.  The duplicate detection is based on a MD5 hint and then a byte by byte comparison to be sure.

Would be quite nice to have a storage layer that only stored the same content once and used reference counting but obviously this has a hit to complexity and performance.

cszamudio
Champ on-the-rise
Champ on-the-rise
Thanks for this thread and the reference to the blog entry.

I plan to use this approach to test whether a user is uploading a duplicate of content already on a Site. 

To me, a content checksum seems like a pretty fundamental requirement for a Content Management System. The Aspects solution is a good workaround.

Carlos