cancel
Showing results for 
Search instead for 
Did you mean: 

Database and filesystem synchronization - reindexing questio

johnwalsh
Champ in-the-making
Champ in-the-making
Greetings,

I'm deploying Alfresco and MySQL in an environment where only a remote data store is trusted for backups. The base plan is to use a ReplicatingContentStore and write a new ContentStore implementation for the remote data store to back up the file system, and to have a backup MySQL database that we take off line regularly and backup to the remote store. The issue is that, on crash recovery, the file system backup will almost certainly be newer than the database recovery.

The Alfresco wiki (http://wiki.alfresco.com/wiki/Replication, section titled Content Store Recovery) suggests that reindexing the repository will fix the case where the database thinks there's a node that is no longer on the file system. I attempted to do this by setting index.recovery.mode to FULL in repository.properties. I see the 10 messages like "15:04:14,312 INFO  [node.index.FullIndexRecoveryComponent]      10 % complete." on restart, but the node without filesystem content is still visible to the user. Am I supposed to be reindexing the repository when, in reality, I'm only reindexing the lucene indexes?

Any help would be appreciated.

Thanks,
John
4 REPLIES 4

johnwalsh
Champ in-the-making
Champ in-the-making
stated more simply: If the file content store and the database get out of sync, is there an automated way to detect that and resync them? (For example, if a file isn't there, remove the corresponding node in the database.)

Thanks,
John

andy
Champ on-the-rise
Champ on-the-rise
Hi

Your case is the other way round -

The wiki is talking about when there is content on the file system, it is not in the DB, but is stilll in the index, then an index rebuild will resolve the issue of the data for a node being in the index, found by a search, and then not in the DB.

If you have nodes in the DB and the content is missing you have a problem.

The wiki describes hot back up (copy the latest index backup, copy the DB, copy the content store). So the content may contain new stuff but have no reference in the DB. If you can replicate the content store you have hot content backup and should not loose content. Your repo recovery is at the point of your DB backup. The index will be a bit behind but cna be configured to catch up fast compared with a full reindex. The content back up will be a bit ahead. Old content is periodically deleted. It is not immediately deleted so your back up will go back to the versions of the docs when the DB was backed up. When old content is cleared up is configurable and should be sensible compared with the DB back schedule, or you should copy the state of the content store - so you have an immutable copy.

If you have nodes in the DB and not in the content store you have lost content somehow. Removing the nodes from the DB will resolve this - but you have still lost something. At least you have the meta data to determine what has been lost.

I hope this clarifies what is going on.

So the question is, how did you loose your content?
May be your back up cycle is too long compared with how long content is held in the content store. You can copy the state of the content store (not the indexes) as a back up whenever you like - it will not have older content removed periodically.

Andy

johnwalsh
Champ in-the-making
Champ in-the-making
Hi Andy,

Your assertion that content isn't deleted until a configurable time period has elapsed addresses my issue.

The problem I was seeing (content listed in the DB but not being present on the file system) would be caused by content deletion after the database backup but before the file system backup, assuming that the content was deleted on the file system at the time of deletion from the repo. Given your assertion, this won't be a problem.

Thanks!
John

lachmac
Champ in-the-making
Champ in-the-making
If you have nodes in the DB and not in the content store you have lost content somehow. Removing the nodes from the DB will resolve this - but you have still lost something. At least you have the meta data to determine what has been lost.

How would I go about removing the nodes from the DB? What should I look for and remove?