<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to avoid double files already in upload in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/how-to-avoid-double-files-already-in-upload/m-p/135947#M36525</link>
    <description>&lt;P&gt;On our ACS we have about 1 million files and more to come. It is a community sharing collection of books, articles, videos, audio-files etc.&lt;BR /&gt;Now I want to make sure, when an upload is made (even bulk uploads), that no already existing file gets added to the repos again.&lt;/P&gt;&lt;P&gt;My idea to solve this is by fingerprints of the file. i.e. MD5.&lt;/P&gt;&lt;P&gt;So I added a MD5 property – which is set by automatic in an input folder, whenever I upload new files in bulk.&lt;BR /&gt;While uploading the process has to look for another existing file&amp;nbsp;in the repository with same md5-fingerprint, if found, then refuse to add that again to the repos.&lt;/P&gt;&lt;P&gt;My programmer has the consideration, that this does not work as intended, as the solr-index is built with time delay long after the upload of a document.&lt;/P&gt;&lt;P&gt;So my solution for this is to bypass ACS and Solr with a separate SQL-table and use a direct sql-command and a separate table just with one column: MD5 to add each one there. No need to know which file this belongs to. Just to fill it and then lookup by use of mysql-index (without solr) whether this MD5 is already there or not. This will be without time lag.&lt;/P&gt;&lt;P&gt;Will this work or has anyone a better idea?&lt;BR /&gt;Perhaps this was already solved with a plugin, as I can not see this feature as very exotic. Everyone wants to avoid redundancy in his repository.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 06 Aug 2022 11:25:58 GMT</pubDate>
    <dc:creator>res44</dc:creator>
    <dc:date>2022-08-06T11:25:58Z</dc:date>
    <item>
      <title>How to avoid double files already in upload</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/how-to-avoid-double-files-already-in-upload/m-p/135947#M36525</link>
      <description>&lt;P&gt;On our ACS we have about 1 million files and more to come. It is a community sharing collection of books, articles, videos, audio-files etc.&lt;BR /&gt;Now I want to make sure, when an upload is made (even bulk uploads), that no already existing file gets added to the repos again.&lt;/P&gt;&lt;P&gt;My idea to solve this is by fingerprints of the file. i.e. MD5.&lt;/P&gt;&lt;P&gt;So I added a MD5 property – which is set by automatic in an input folder, whenever I upload new files in bulk.&lt;BR /&gt;While uploading the process has to look for another existing file&amp;nbsp;in the repository with same md5-fingerprint, if found, then refuse to add that again to the repos.&lt;/P&gt;&lt;P&gt;My programmer has the consideration, that this does not work as intended, as the solr-index is built with time delay long after the upload of a document.&lt;/P&gt;&lt;P&gt;So my solution for this is to bypass ACS and Solr with a separate SQL-table and use a direct sql-command and a separate table just with one column: MD5 to add each one there. No need to know which file this belongs to. Just to fill it and then lookup by use of mysql-index (without solr) whether this MD5 is already there or not. This will be without time lag.&lt;/P&gt;&lt;P&gt;Will this work or has anyone a better idea?&lt;BR /&gt;Perhaps this was already solved with a plugin, as I can not see this feature as very exotic. Everyone wants to avoid redundancy in his repository.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 06 Aug 2022 11:25:58 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/how-to-avoid-double-files-already-in-upload/m-p/135947#M36525</guid>
      <dc:creator>res44</dc:creator>
      <dc:date>2022-08-06T11:25:58Z</dc:date>
    </item>
  </channel>
</rss>

