<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Double files in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/double-files/m-p/490031#M40152</link>
    <description>&lt;P&gt;On our ACS we have about 1 million files and more to come. It is a community sharing collection of books, articles, videos, audio-files etc.&lt;BR /&gt;Now I want to make sure, when an upload is made (even bulk uploads), that no already existing file gets added to the repos again.&lt;/P&gt;&lt;P&gt;My idea to solve this is by fingerprints of the file. i.e. MD5.&lt;/P&gt;&lt;P&gt;So I added a MD5 property – which is set by automatic in an input folder, whenever I upload new files in bulk.&lt;BR /&gt;While uploading the process has to look for another existing file in the repository with same md5-fingerprint, if found, then refuse to add that again to the repos.&lt;/P&gt;&lt;P&gt;My programmer has the consideration, that this does not work as intended, as the solr-index is built with time delay long after the upload of a document.&lt;/P&gt;&lt;P&gt;So my solution for this is to bypass ACS and Solr with a separate SQL-table and use a direct sql-command and a separate table just with one column: MD5 to add each one there. No need to know which file this belongs to. Just to fill it and then lookup by use of mysql-index (without solr) whether this MD5 is already there or not. This will be without time lag.&lt;/P&gt;&lt;P&gt;Will this work or has anyone a better idea?&lt;BR /&gt;Perhaps this was already solved with a plugin, as I can not see this feature as very exotic. Everyone wants to avoid redundancy in his repository.&lt;/P&gt;</description>
    <pubDate>Tue, 20 May 2025 03:48:54 GMT</pubDate>
    <dc:creator>Troglodyte</dc:creator>
    <dc:date>2025-05-20T03:48:54Z</dc:date>
    <item>
      <title>Double files</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/double-files/m-p/490031#M40152</link>
      <description>&lt;P&gt;On our ACS we have about 1 million files and more to come. It is a community sharing collection of books, articles, videos, audio-files etc.&lt;BR /&gt;Now I want to make sure, when an upload is made (even bulk uploads), that no already existing file gets added to the repos again.&lt;/P&gt;&lt;P&gt;My idea to solve this is by fingerprints of the file. i.e. MD5.&lt;/P&gt;&lt;P&gt;So I added a MD5 property – which is set by automatic in an input folder, whenever I upload new files in bulk.&lt;BR /&gt;While uploading the process has to look for another existing file in the repository with same md5-fingerprint, if found, then refuse to add that again to the repos.&lt;/P&gt;&lt;P&gt;My programmer has the consideration, that this does not work as intended, as the solr-index is built with time delay long after the upload of a document.&lt;/P&gt;&lt;P&gt;So my solution for this is to bypass ACS and Solr with a separate SQL-table and use a direct sql-command and a separate table just with one column: MD5 to add each one there. No need to know which file this belongs to. Just to fill it and then lookup by use of mysql-index (without solr) whether this MD5 is already there or not. This will be without time lag.&lt;/P&gt;&lt;P&gt;Will this work or has anyone a better idea?&lt;BR /&gt;Perhaps this was already solved with a plugin, as I can not see this feature as very exotic. Everyone wants to avoid redundancy in his repository.&lt;/P&gt;</description>
      <pubDate>Tue, 20 May 2025 03:48:54 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/double-files/m-p/490031#M40152</guid>
      <dc:creator>Troglodyte</dc:creator>
      <dc:date>2025-05-20T03:48:54Z</dc:date>
    </item>
  </channel>
</rss>

