cancel
Showing results for 
Search instead for 
Did you mean: 

Document storage design (backup and restore)

mgpa
Champ in-the-making
Champ in-the-making
Firstly, we are not currently using Alfresco.  Our evaluation version of the latest beta of v3 does not work at all (but that is off topic).

We currently have over 1.3 million documents, nearly all 30kB to 80kB in size.  Our main problem is that we use a tape backup solution.  Given the volume of data (c 55 GB), it takes a disproportionately long time to back up all of these files, when compared to backing up similar sized data sets for SQL Server and Exchange, for example.  We do not have a server replication solution (at the moment).

We currently have a Windows 2003/8 environment and the appeal of using CIFS to import our document store has a huge appeal.  (We are not adverse to a non-Windows environment but the costs of learning and supporting the new environment must outweigh the savings made and without any loss of resilience – no flaming, please.)

From the forum and the wiki, Alfresco will give us version control, document indexing and potentially better user access control but it does not appear to answer our concern of “how do we backup and restore 1 million+ files as quickly as possible?”.   This is especially important if we have to implement a disaster recovery plan.

Alfresco seems to use a flat storage space that can either be stored locally or on a SAN.  Has any thought been considered to storing the files in another medium that can be more quickly backed up and restored?  Total elapsed time is the key here.

We could just replicate to an USB external hard disk but I would like to consider more secure and easily automated solutions.  I would be interested to know if anyone has thought about and overcome such an issue.

Finally, how would Alfresco cope if the Alfresco software and the indexes were restored prior to the restoration of the document store?  For example, rebuild a server with Alfresco (et al) on it, restore the indexes, kick off the restoration of the data files and then start Alfresco, without waiting for the restoration job to finish.  This would mean that Alfresco would be available to users, so that they could create new content whilst the backup restore was running restoring the old content.  Would we have to rebuild the indexes once the data restore job has completed?  Or would it be recommended to?  (The obvious difference being that we could get away with being "recommended to" until the next planned outage or being forced to would cause another outage to be much sooner than planned.)

For us, it would be more important for our user base to be able to create new content than it would be to view old content.  This does not mean that the old content is not important, it just means that we must always be able to create new content, regardless of whether old content is available or not.

Thanks
Marcus
2 REPLIES 2

mrogers
Star Contributor
Star Contributor
Just a quick points.
No it would not be a good idea to restart alfresco prior to restoring the document store, restore of alfresco requires the document store to be done before the DB (I think.)

Could you have a hot standby disk that is periodically  (say every few minutes) updated with the change set.  Then if disaster strikes switch to that standby disk?    That would be quicker than restore from tape.   And your tape backups can then just be en extra layer in your long term off-line backup.   One thing to be aware of is that the data in the content store is fairly static.   So even if you have 55GB of data, only a small fraction of that data will be new.

I'm sure Administrators will be able to give fuller answers.

mgpa
Champ in-the-making
Champ in-the-making
Thanks for your response.  It would have been my guess that the indexes would be a bit unhappy about being restored without data too.

The problem/issue with disaster recovery is that you have to assume that any backup on-site is not accessible.  This could be due to fire, flood, etc. and worse still, the problem can be with one of your neighbours.

I know that tape back up is not ideal and server replication off-site is the way to go but we all have budgets to stick to.