<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic contentstore:  unnecessary file duplication in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82051#M54982</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I'm using Alfresco 2.0 CE, on Windows Server 2003, Tomcat, and MS SQL Server.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;As I was debugging something, I noticed that the files in the contentstore folder do not get cleaned up after a modification.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Using the NodeBrowser to find the contentURL and examining alf_data/contentstore, I see that files get duplicated in the backend after every modification.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;For example, FileA.htm without versioning:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;FileA.htm's content URL:&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;contentUrl=store://2007/5/4/13/22/7b78124f-fa6c-11db-9240-eba5cf10755f.bin|mimetype=text/html|size=14261|encoding=UTF-8|locale=en_US_&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;FileA.htm's content URL, after a modification&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;contentUrl=store://2007/6/12/11/13/e3154088-18ff-11dc-8092-3142445812fe.bin|mimetype=text/html|size=14267|encoding=UTF-8|locale=en_US_&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;FileA.htm's content URL, after another modification:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;contentUrl=store://2007/6/12/11/17/5f24bdd7-1900-11dc-8092-3142445812fe.bin|mimetype=text/html|size=14276|encoding=UTF-8|locale=en_US_&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Each binary file above in the contentstore remains there after the modification.&amp;nbsp; This seems like a serious disk space consumer for no apparent reason.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Is this a bug, or is there more going on in the back that I'm not aware of?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 12 Jun 2007 17:21:38 GMT</pubDate>
    <dc:creator>qasimh</dc:creator>
    <dc:date>2007-06-12T17:21:38Z</dc:date>
    <item>
      <title>contentstore:  unnecessary file duplication</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82051#M54982</link>
      <description>Hi,I'm using Alfresco 2.0 CE, on Windows Server 2003, Tomcat, and MS SQL Server.As I was debugging something, I noticed that the files in the contentstore folder do not get cleaned up after a modification.Using the NodeBrowser to find the contentURL and examining alf_data/contentstore, I see that fi</description>
      <pubDate>Tue, 12 Jun 2007 17:21:38 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82051#M54982</guid>
      <dc:creator>qasimh</dc:creator>
      <dc:date>2007-06-12T17:21:38Z</dc:date>
    </item>
    <item>
      <title>Re: contentstore:  unnecessary file duplication</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82052#M54983</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Did you have an explanation for this problem. &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;I noticed the same pb (without versionning)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Amar&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Oct 2007 10:57:17 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82052#M54983</guid>
      <dc:creator>amar</dc:creator>
      <dc:date>2007-10-17T10:57:17Z</dc:date>
    </item>
    <item>
      <title>Re: contentstore:  unnecessary file duplication</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82053#M54984</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;Did you have an explanation for this problem. &lt;BR /&gt;I noticed the same pb (without versionning)&lt;BR /&gt;Amar&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;SPAN&gt;Nope… no responses, no advice, no followups&amp;nbsp; &lt;img id="smileysad" class="emoticon emoticon-smileysad" src="https://connect.hyland.com/i/smilies/16x16_smiley-sad.png" alt="Smiley Sad" title="Smiley Sad" /&gt;&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 29 Nov 2007 18:12:58 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82053#M54984</guid>
      <dc:creator>qasimh</dc:creator>
      <dc:date>2007-11-29T18:12:58Z</dc:date>
    </item>
    <item>
      <title>Re: contentstore:  unnecessary file duplication</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82054#M54985</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;IMHO, that's an Alfresco implementation choice to never modify any content file stored on a content store but creates systematically a new content copy, the latter being updated and associated with current metadata (old file content becoming de facto an orphan content if not any versionning has been defined).&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It guarantees you that Alfresco will never implicitly delete any "valid" content. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;To keep things clean, if I'm not wrong (to be confirmed), there is a scheduled cleaner job which will detect (in other things) any obsolete (orphan) file content and will delete it automatically.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Hope it helps you.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Dec 2007 14:18:57 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82054#M54985</guid>
      <dc:creator>ero</dc:creator>
      <dc:date>2007-12-04T14:18:57Z</dc:date>
    </item>
    <item>
      <title>Re: contentstore:  unnecessary file duplication</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82055#M54986</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Thank you ERo.&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Do you know anything more about this scheduled job that's supposed to take care of these orphaned nodes/content?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;We deployed Alfresco about 6 months back and the alf_data repository is currently 4GB large.&amp;nbsp; I know it's not much now, but we only have about 1.0-1.5 GB of valid content in there (very little of it is auto-versioned).&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;For the first time since the deployment, I permananently removed old deleted content (from the web-ui)… about 1-1.5K files (total size &amp;gt; hundreds of MBs).&amp;nbsp; However, the alf_data folder size has not changed at all.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;As things are now, it all seems terribly inefficient and inpractical.&amp;nbsp; As this content continues to grow, disk ineffeciencies are bound to creep up.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Any insight on this matter will be greatly appreciated …. uhm uhm (Alfresco Engineers).&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;-Q&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 05 Dec 2007 19:32:18 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82055#M54986</guid>
      <dc:creator>qasimh</dc:creator>
      <dc:date>2007-12-05T19:32:18Z</dc:date>
    </item>
    <item>
      <title>Re: contentstore:  unnecessary file duplication</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82056#M54987</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I found the scheduler that is supposed to clean the content store:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;WEB-INF/classes/alfresco/scheduled-jobs-context.xml&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I see lots of beans that might help clean up things…. I I'll start looking into this .&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;-Q&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Dec 2007 00:36:58 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82056#M54987</guid>
      <dc:creator>qasimh</dc:creator>
      <dc:date>2007-12-06T00:36:58Z</dc:date>
    </item>
    <item>
      <title>Re: contentstore:  unnecessary file duplication</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82057#M54988</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi Q,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;That job is absolutely necessary to keep clean Alfresco repository (e.g. CIFS protocol being not transactionnal, if you are uploading some content and your network connexion goes down during upload process, the repository will become inconsistent (you will see metadata in the Web UI but you will be unable to access any content)). &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;You are right, scheduled-jobs-context.xml file is the file you can customize.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;For the rest, you will find all cleaner job related Java code in : &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;HEAD\root\projects\repository\source\java\org\alfresco\repo\content\cleanup&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;and all file content store java code (content deletion implementation)&amp;nbsp; in :&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;HEAD\root\projects\repository\source\java\org\alfresco\repo\content\filestore&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;(FileContentStore.java)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;ERo&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 07 Dec 2007 15:15:09 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/contentstore-unnecessary-file-duplication/m-p/82057#M54988</guid>
      <dc:creator>ero</dc:creator>
      <dc:date>2007-12-07T15:15:09Z</dc:date>
    </item>
  </channel>
</rss>

