cancel
Showing results for 
Search instead for 
Did you mean: 

How to mirror document library to the external drive?

upforsin
Star Collaborator
Star Collaborator

Hello

I would like to mirror document library files and folder structure to the backup drive every night (so downloading whole document library is not an option). How can I do it?

Alfresco stores files in different structures and with different names so it is impossible to just copy files.

howkymike
Alfresco Developer
1 ACCEPTED ANSWER

jpotts
World-Class Innovator
World-Class Innovator

The best backup approach is to (1) dump your database and then (2) backup your entire content store directory.

However, if you want to mirror your document library to an external drive, there are several ways to do it...

1. Your document library is accessible via WebDAV. You could mount it as a drive, then use rsync to copy it to a backup volume. Obviously you will not get the metadata or the version history if you go this route.

2. You could write a script that uses the REST API or CMIS to crawl the document library and write new and updated files to the backup volume.

3. You could use behaviors to track when files are new or updated and then put a message on a queue. In a separate process, have Java code that subscribes to the queue and when it seems an event, fetch the file via a REST API or CMIS and write it to the backup volume.

I've successfully used option 3 for a client. We added UI actions to Share that allow authorized end-users to "flag" a document for backup. It works as described above, but instead of watching for any create/update, the behavior watches for the presence of a "marker" aspect that indicates the file should be archived.

View answer in original post

4 REPLIES 4

jpotts
World-Class Innovator
World-Class Innovator

The best backup approach is to (1) dump your database and then (2) backup your entire content store directory.

However, if you want to mirror your document library to an external drive, there are several ways to do it...

1. Your document library is accessible via WebDAV. You could mount it as a drive, then use rsync to copy it to a backup volume. Obviously you will not get the metadata or the version history if you go this route.

2. You could write a script that uses the REST API or CMIS to crawl the document library and write new and updated files to the backup volume.

3. You could use behaviors to track when files are new or updated and then put a message on a queue. In a separate process, have Java code that subscribes to the queue and when it seems an event, fetch the file via a REST API or CMIS and write it to the backup volume.

I've successfully used option 3 for a client. We added UI actions to Share that allow authorized end-users to "flag" a document for backup. It works as described above, but instead of watching for any create/update, the behavior watches for the presence of a "marker" aspect that indicates the file should be archived.

upforsin
Star Collaborator
Star Collaborator

Thank you for such an extensive answer. I tried to implement option nr. 2 -  script in Python3 using your cmislib3 (unfortunately, I can find docs only for cmislib 😕 ).

It is easy to traverse through directories but I can't download files.

This is the code I use: 

content = repo.getObjectByPath("/Sites/test/test.png")
o = open(content.getName(), 'wb')
result = content.getContentStream()
o.write(result.read())
result.close()
o.close()

The downloaded file has a size of 0 bytes... (result is str, not binary so it throws an error, I can change 'wb' to 'w', but then the binary files are invalid)

EDIT. It looks like Python3.8 in incompatible.

EDIT2. In the binding.py you have to replace 1938-19340 lines with  'return io.BytesIO(result.content)'

howkymike
Alfresco Developer

EddieMay
World-Class Innovator
World-Class Innovator

Hi @upforsin,

Just to clarify - if you make the change in EDIT2 can you now download successfully? If so, is this problem now solved?

Thanks, 

Digital Community Manager, Alfresco Software.
Problem solved? Click Accept as Solution!

upforsin
Star Collaborator
Star Collaborator

Yes, thanks to @jpotts I successfully managed to copy document structure to FTP.

First, I tried option nr. 3 - 3 days of writing Python code, I managed to download files (after modifying CMIS library) but then I couldn't upload some files to FTP (because binary files had chars incompatible with Latin encoding *??).

Then, in 10 minutes I implemented option nr. 1 using WebDAV and lftp. Thank you once again, Jeff! 

*Python error: Unresolved “UnicodeEncodeError: 'latin-1' codec can't encode characters in position xxx: ordinal not in range(256)”

Here is the script if anyone were interested: https://pastebin.com/srfbtvJT

howkymike
Alfresco Developer