cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco Old Content Archiving

unknown-user
Champ on-the-rise
Champ on-the-rise
Hi,

We have a requirement where any content that is not being used in the past 6 months need to be moved to an Archive Repository (another Alfresco instance). We have a huge repository with close to 2 million docs and do not want to keep any docs thats not being used for 6 months for performance reasons. The content could be PDF, Doc, Images etc.

Few options that are being considered -

a) RM Module in Alfresco - Can we achieve this through RM? Can we create a File Plan that will transfer the older content to another Repository? Our Archive Alfresco Repository is an older version than the live version. Can RM transfer content to a repository with different version?

b) Transfer Service - Run some scheduled job to identify the content that is not being used and then use the Transfer service to transfer them to an archive repository. Does the Transfer Service work with different versions of Alfresco?

c) Content Store Selector - Use the content store selector feature to move the old content to a different storage. But through this, we'll be able to archive only the content and not the metadata.

One of the challenges here is the version conflict of the repositories. Is there any other way we can achieve this?

Thanks,
jjacob
4 REPLIES 4

mitpatoliya
Star Collaborator
Star Collaborator
Well, I think version conflict is going to be biggest evil for your requirement.
What you can do is to have two schedular.

First will export the content which are not used for more then six month.
During export it will export content along with metadata in one of the file(XML or CSV).

Second scheduler will pickup those contents from the file system and import the contents in alfresco repository and attache the metadata by reading that same metadata file.

unknown-user
Champ on-the-rise
Champ on-the-rise
Thanks for sharing your thoughts.

If version is not an issue, what would be the best way to get this done? Is there any other approach to solving this issue other than transferring the content to an Archive repository?

The goal behind doing this is to keep the Primary Alfresco repository light and fast by having only 6 months of docs at any point in time. Our repository is pretty huge (close to 2mn docs) now and continuously growing with about 2000 docs getting added daily.

mitpatoliya
Star Collaborator
Star Collaborator
Well,If version is not the constrain then you can follow your option b which you mentioned.
Where your scheduler will check all contents for its last modified date and archive all the contents which are not modified in last six months.

You will require to enhance your infrastructures in the case you are using archive store to archive all your old content.
I mean your heap size and your storage space where your alf_data is residing.
Also you will require to device the mechanism to maintain the Lucene indexes as it is very crucial while retrieving the old archived content.
Are you using any kind of clustering?

unknown-user
Champ on-the-rise
Champ on-the-rise
Thanks.

We dont use clustering.

If the old content is "transferred" to a remote repository, how can we retrieve it back when a user searches for an old content. The lucene index are maintained only on the Live server.

Is there any documentation available on this?

Thanks