cancel
Showing results for 
Search instead for 
Did you mean: 

Backup & Restore

smcardle
Champ in-the-making
Champ in-the-making
Hi All

Can anybody tell me which data store, if any, is the primary?

Alfresco uses both the file system and a database to store repository data and we have had 2 instances of mismatched data due to configuration changes and a data corruption in the last couple of weeks.

Here are the scenarios we are trying to make sure we address:

1. If the database becomes corrupted and we restore a version that is a few days old, can this be re-synced with the file systems alf_data directory so that they can still work together even if that means loosing some data from the alf_data directory?
2. If the alf_data directory becomes corrupted can we re-create this from the current database? i.e. are the current documents also stored in the database along with historical versions?

We always backup both the alf_data directory and the database at the same time, however, there is always a slight time difference between the 2. Our solution is 24/7 and it does not seem possible to avoid the possibility that no data has been modified during the backup process thus a restore will produce a mismatch.

Does anybody have a bullet proof backup & restore policy for these two data stores?

Regards

Steve
11 REPLIES 11

mrogers
Star Contributor
Star Contributor
http://wiki.alfresco.com/wiki/Backup_and_Restore

You back up the database first, then the content store.

raju
Champ in-the-making
Champ in-the-making
HI all,

we have lost our alfresco community server but we have the backup of database and alf_data folder so i have installed new Alfresco community and restore and and replace that alf_data folder and started services but i unable to login no user working anybody help me rectify this issue is this write process which i have done for restore or is there any file to replace from old server to new server while restore except alf_data its urgent please provide solution as soon as possible.

while logging with any previous users am getting the same error but before we have used those user to login the error is:

The remote server may be unavailable or your authentication details have not been recognized.

thanks

R@J..!!!

smcardle
Champ in-the-making
Champ in-the-making
Hi

Thanks for the quick reply.

I had already seen these instructions but even with the hot backup process there is a time delay that can cause inconsistencies in data between the database and file system in a 24/7 scenario.

We will need to measure the time it takes to perform a backup and see if it is acceptable to place Alfresco into read-only mode for this time period, I would be concerned about increases in this time as more data is added.

Alternatively, we may look at a custom hot backup policy where only data existing prior to the backup time is stored, this would need to include the possibility of documents being updated, replaced or removed after the backup time and before the backup is finished. We may be able to cater for this with something similar to a Big Memory or ehcache solution where we place a large cache in front of the data stores and instruct the cache to flush prior to backup and then set a no write-through option for the duration of the backup. Once the backup is complete we can instruct the cache to continue as usual.

Regards

Steve

mrogers
Star Contributor
Star Contributor
The time delay of up to a couple of hours is not a problem since the content store does not update, only adds new content.  

So when you back up you may well have a few extra content items when the content store which won't matter.

And obviously don't run the orphan reaper while you are backing up!

smcardle
Champ in-the-making
Champ in-the-making
So what happens in the cases where content is updated or removed during this time?

The original content is moved to a blob in the database and the new content assumes the place of the old content… ?

This must mean there is a chance of content mismatch where files are replaced, removed and new versions are added…


Also, if your stating that content can only be added during this time how is this reconciled during a restore?

Regards

Steve

mrogers
Star Contributor
Star Contributor
Content is not updated in the content store and is only removed by the "orphan reaper"   which you won't run during your backup.

lachmac
Champ in-the-making
Champ in-the-making
I had a bit of a disaster, in that I lost a weeks worth of backup, so I now have a content store which is 10 days old, and a db which is fresh, but with the old backups accidentally deleted.

Is there any way to get this working? What would be the best procedure?

Running on 3.4.d, Debian, MySQL, Tomcat

lachmac
Champ in-the-making
Champ in-the-making
I had a bit of a disaster, in that I lost a weeks worth of backup, so I now have a content store which is 10 days old, and a db which is fresh, but with the old backups accidentally deleted.

Is there any way to get this working? What would be the best procedure?

Running on 3.4.d, Debian, MySQL, Tomcat

This was my status. Thanks to information from Loftux, and help from a young computer wizard, our share is now up and running again. Albeit I did loose the data of 10 days, so we have tightened up on back-up procedures.

This is how we cleaned up and managed to get a 10 old data directory to work with a 24h old copy of the DB. The problem is that the DB contains pointers to the data (files, blog posts, pictures etc) in the alf_data directory, in the alf_content_url table. When it finds nothing it throws an error and renders the site unusable. You could delete the all the pointers that had been created after the date and time for which there is no matching data, but there are other tables in the database that use the information in the alf-content-url table, so this would not be any good.

The first step was to get a listing of all the items in the content store

## Visit the contentstore and list its contents in a text file
## Also format it to be matcheable against the next file we will create
cd /opt/alfresco-##/alf_data/contentstore
find . | grep - > contentstore.txt
sed -i 's/\.\//store:\/\//g' contentstore.txt

Then get the alf_content_url table listed in a similar way

## Dump the alf_content_url table to a text file (with nice linebreaks)
mysqldump -u YOUR_DB_USER -p YOUR_ALFRESCO_DB alf_content_url | sed 's$),($),\n($g' > alf.sql

Then compare the two, to see what is in the DB that is NOT in the alf_data contentstore.

## Script to get a list of the missing files
dblist=( `cat "alf.sql" `)
filelist=( `cat "contentstore.txt" `)

left=${#dblist[@]}
for pointer in "${dblist[@]}"
do
        exists=false
        pointerpath=( `echo $pointer | cut -d "'" -f 2 `)
        for filepath in "${filelist[@]}"
        do
                if [ "$filepath" == "$pointerpath" ]; then
                        exists=true;
                fi
        done
        let left-=1
        echo $left

        ### List files that does not exist
        if [ "$exists" == 'false' ]; then
                echo $pointer >> pointers.txt
                echo $pointer
        fi
done

Then we need to create something for the pointers to point at, a dummy file. The start of file and end of file "{" and "}"

## Missing files are now listed in pointers.list create a dummy file containing only {} and place it in the contentstore (create a path similar to the ones already there)
## Putting {} in the dummy file avoids problems with missing files containing JSON-data.
## Lets say we find that all content files over id 7000 are missing, point them to the new file we created with the SQL query below
UPDATE `YOUR_DB_NAME`.`alf_content_url` SET `content_url` = 'store://2011/11/30/9/26/713d9b34-3f6b-472c-8b5e-4e4b74a5b66.bin' WHERE `alf_content_url`.`id` >7000;

We then copied in the alf_data directory (10 day old back-up), set the reindexing
index.recovery.mode=FULL
started alfresco and crossed our fingers. Success. I then did a search from inside share to delete any files that were just "{" and "}". And we now have a working site. I have seen, somewhere in the forums, a method to get a working site from a DB back-up that is older than the alf_data directory. Here is now a method for the reverse, a working site from an alf-data directory that is older than the DB back-up.

Do your back-up and thanks to all that helped with this!

rjohnson
Star Contributor
Star Contributor
I have moved Alfresco instances between machines by doing exactly what you have done, so in principal it should work just fine.
First point, is the version of Alfresco you have installed the same as the version you lost?
Second, have you checked alfresco.log and catalina.out to see exactly why it thinks the remote server is not available?
Finally, I assume that you shut down Alfresco completely before you did your backups and before you did your restore.


Bob Johnson