10-12-2018 02:48 PM
I am looking to save a checksum for managed content. We have multiple sources that save images to alfresco and unfortunately, we end up housing a lot of duplicates. Looking into ways that will alleviate the problem.
10-12-2018 04:07 PM
Not out of the box, but you can add it easily. I did something similar for another client. You can create a behavior that computes a hash on the content stream every time it is updated, and store that hash as a property on the content. Then, finding duplicates is just a matter of running a search for all documents that have that same hash value.
I think I saw that version 6.x added something related to checksums but I have not investigated to see if it is similar to what I describe above.
10-13-2018 12:34 PM
Jeff is right when he mentions „something related“ in the newer versions you have document fingerprinting.
Document Fingerprints | Alfresco Documentation
You can also find related documents with fingerprinting.
I saw it first in a tech Talk live - and - again - an excellent article from Andy Hind about document fingerprints.
https://community.alfresco.com/people/andy1/blog/2017/05/12/document-fingerprints
Maybe this helps...
10-15-2018 05:09 PM
I am not finding much documentation on fingerprinting of image and other media content. Any idea if this has been designed to cater toward text content?
10-15-2018 04:42 PM
Thank you. Before we implemented something ourselves that will save the hashes, I wanted to see if Alfresco had something to offer before we tried to reinvent the wheel. Looks like we have v5.2 and I am not sure if an upgrade is pending and we might not be able to use the Document Fingerprint option yet.
11-14-2018 08:55 AM
Hi
Fingerprinting was designed for text only.
If you can turn your image into a text representation than you can use it.
Andy
Explore our Alfresco products with the links below. Use labels to filter content by product module.