Hyland Connect

resplin · ‎06-05-2015

The official documentation is at: http://docs.alfresco.com

Introduction

The Transfer Service came into existence in version 3.3 of Alfresco. Transfer Service 3.3 Its purpose is to provide a means of pushing information out of an Alfresco core repository ('DM') to configured targets. The transfer service is accessible as a bean named 'TransferService' that is defined, along with other related beans, in the transfer-service-context.xml Spring context file.

Overview

In version 3.4, the Transfer Service is a subsystem with an API that offers the following features:

Register a new transfer target
Transfer a new node for the first time (no corresponding node on the target)
Transfer a node that has a path-based corresponding node on the target
Transfer an update to a previously deployed node
Transfer any number of nodes
Transfer a restored node
Specify which association types are to be traversed for a given transfer
Discover what transfer targets are available
Configure a transfer target to use either HTTP or HTTPS
Cancel a transfer
Verify the details of a transfer target
View a record of a previous transfer attempt
Indicate that a transfer should run asynchronously with callback
Edit the attributes of a transfer target
Get the status of a given transfer
Restrict a transfer by class of content
Retrieve transfer records for a given transfer target
Unregister an existing transfer target
Delete a node by transferring the node's Archive node ref.
Read only option on transfer to 'lock' transferred nodes on the destination repository.
A new mode of transfer 'Sync'. So nodes can be deleted if they are absent on the source repository.
Prevent transfer deleting folders that contain 'alien' content.
Logic to handle merging together transfers from several repositories.
Destination transfer reports are pulled back at the end of transfer.
Only content that has changed is transmitted.
Transfer of permissions.
Transfer of cm:modified and cm:modifiedBy properties.

Design

As one might expect, the Transfer Service comprises two major parts: the part that is responsible for sending information from the source repository and the part that is responsible for receiving information in the target repository. The source repository pushes information to the target repository over a network transport. In 3.4 there is support for the use of HTTP and HTTPS across the network. Connections needed for a transfer always originate from the source.

Through the Transfer Service it is possible to create and persist information about any number of Transfer Targets. A Transfer Target records sufficient information about the target system to enable the service to establish an authenticated connection to it. Each transfer target record in the source repository is placed in a transfer target group. Currently there is just one transfer target group defined (called 'Default') and there is no means of creating new ones. The service is likely to be extended in the future to allow the management of transfer target groups.

Each transfer target is named, and the name must be unique within the transfer target group that contains it. Some operations on the TransferService interface allow for a transfer target name to be supplied but not the name of a transfer target group. In these cases the default transfer target group is assumed.

In order to transfer information to one of the configured transfer targets you simply create a Transfer Definition and pass it to the Transfer Service along with the name of the transfer target that you want to transfer to. A transfer definition identifies what should be transferred and has the potential to include some directives about how it should be transferred. In 3.3 a transfer definition comprises simply a collection of NodeRef objects. Note that it is acceptable for this collection to include NodeRefs of nodes that are in the archive store. When the target repository receives such a NodeRef during a transfer the corresponding node will be deleted.

When the transfer service receives a request for a transfer to be made, the first thing it does is export a snapshot of the nodes that are included in the transfer. This snapshot contains all the nodes' properties, but not the content of any properties of type d:content - instead the relevant content URLs for the content files are included in the snapshot. This makes the snapshot relatively lightweight and quick to generate. Once the snapshot has been created, the transfer service makes contact with the specified target and initiates a transfer. Any given target can receive just one transfer at a time currently, as this ensures that conflicts can't occur. The target puts in place a lock (a node named '.lock' beneath the Data Dictionary/Transfers folder) and returns a unique identifier for the transfer, and the source then starts transmitting the snapshot.

As the target system receives the snapshot it streams it into a local staging area on disk. After having sent the snapshot, the transfer service then works out which content items are required and sends the necessary content files over. These are batched up in groups - one of the goals of the design was to minimize the 'chattiness' of the transfer protocol - and staged on the target system's local disk.

Once the snapshot and associated content files have been transmitted, the transfer service asks the target system to commit the data to its repository. At this point, the receiver on the target system parses the received snapshot and reproduces the contained information in its local repository. This is done in three stages by default - the first writing the nodes and their properties, the second dealing with associations and the third dealing with sync mode delete. It is possible to add new stages into this process if desired. When receiving a node, the receiver tries to resolve a corresponding node in the target repository based first on the node ref and then on the node's path. When a node is transferred that does not have a corresponding node in the target repository (either by path or by node ref) then a new node is created that has the same node ref as the transferred node.

Throughout the commit process, a record is written to a node in the target repository that lists what is being done. This node, stored in the 'Inbound Transfer Records' folder beneath 'Data Dictionary/Transfers', also has a few properties on it that records the transfer status. The name of this transfer record node is the date/time stamp that the transfer started. After the transfer has completed this transfer report is pulled back to the source system and written as the 'destination transfer report' which is a sibling of the 'client transfer report' placed below the transfer target.

On the source end of the transfer, the caller may choose whether the transfer should be carried out synchronously (transfer) or asynchronously (transferAsync). Whichever version is used, the caller may optionally provide one or more callback objects (implementing the TransferCallback interface). As the transfer proceeds these objects are notified of progress by events being passed to their processEvent operation. One of these events (TransferEventBegin) contains the transfer identifier, and, once received, this can be used by the caller to cancel an 'in-flight' transfer.

As well as the interfaces and mechanisms needed to actually carry out the transfer, there are also a few classes intended to help build the set of nodes that the caller wants to transfer. The relevant interfaces are NodeCrawlerFactory, NodeCrawler, NodeFinder, and NodeFilter (all in the package org.alfresco.service.cmr.transfer). There is one implementation of each of the NodeCrawlerFactory and NodeCrawler interfaces (the standard NodeCrawlerFactory is a bean named 'NodeCrawlerFactory'). There are a couple of NodeFinder implementations that enable associations to be traversed (child and peer), and one NodeFilter implementation that enables content of given classes (types and aspects) to be included and excluded from the node crawl. It's simple to add new finders and filters to provide custom behaviour that meets a particular need.

Note that the interface exposed by the target repository (the receiver) should be considered an internal interface. It is liable to change over time, and no effort will be made to retain backwards compatibility.

Events raised during a transfer

As mentioned above, when requesting a transfer it is possible to supply a collection of TransferCallback objects. The TransferCallback interface defines one operation:


void processEvent(TransferEvent event);

As the transfer proceeds, events are raised and passed to each of the callback objects. The classes of events that can be raised are:

TransferEventBegin is sent when the transfer starts. It contains the identifier of the transfer which can later be used to cancel the transfer if desired
TransferEventEnterState is sent immediately after the transfer moves to a new state. The possible states of a transfer are START, SENDING_SNAPSHOT, SENDING_CONTENT, PREPARING, COMMITTING, SUCCESS, and ERROR. The state of the transfer is always available from any event via its getTransferState operation.
TransferEventEndState is sent immediately prior to the transfer leaving its current state.
TransferEventSendingSnapshot is sent when the snapshot file is being transmitted to the target repository
TransferEventSendingContent is sent when a content file is being transmitted to the target repository
TransferEventSentContent is sent when a content file has been sent to the target repository
TransferEventCommittingStatus is sent to provide an update as to progress while the target repository is processing the transferred data. It has two operations that provides this information: getPosition that indicates where the process is up to at the moment and getRange that indicates where the process has to get to before it is complete. Note that the value of the range can change as the process proceeds.
TransferEventSuccess is sent if the transfer completes successfully
TransferEventCancelled is sent if the transfer is cancelled
TransferEventError is sent if the transfer ends with an error. This event exposes an operation named getException that can be used to help determine the cause of the problem.
TransferEventReport this event is send when a transfer report is written. It contains the nodeRef of the report and the type of the report. There are currently two types of report one from the source and one from the destination.

The transferred aspect

Alfresco 3.4 contains an aspect, trx:transferred, that indicates that a node has been transferred via the transfer subsystem.

It contains two fields, the repository id of the 'originating' system which is the repository that the node is first created and the 'from' repository id which is the repository id of the system that transferred the node to the local repository.

The basic property sheet for this aspect is included with the configuration of Alfresco Explorer.

A UI feature of Share presents the option to edit a transferred node on the originating instance of Share rather than the local repository.

The alien aspect

This is an implementation detail that may change in future versions of alfresco.

Alfresco 3.4 contains an aspect, trx:alien that contains a multi-valued property of which repositories 'invade' the local repository. See below for more information.

Sync Mode Transfer

Alfresco 3.4 adds a new 'mode' of transfer called 'sync mode'. There is a boolean flag on the transfer definition to specify whether transfer is sync mode or not.

Sync mode adds extra processing to infer by the absence of an association between the parent node and child node that a child node should be deleted.

Sync Mode Transfer Slide 1.GIF

In the example of the screenshot above when node A1 is transferred there is an association between A1 and A2 so A2 remains, however there is no association between A1 and A3 so A3 is deleted.

However although the requirement above sounds simple, what happens if there are associations to content that was not trasferred or was transferred from a different repository? For example if an 'images' folder is transferred and then content is added from the local repository. If transfer is not careful then sync mode transfer will incorrectly delete content that does not exist on the transferring repository.

The first part of the solution is to mark all transfered nodes with an aspect (trx:transferred) which says which repository the transfered node is from. So now transfer can determine whether nodes to delete that are from the sending system. Transfer will not delete nodes that are not from the transferring repository.

Transfer Service Alien 1.gif

In the example of the screenshot above the node B3 is a local node. So transfer of A1 must not delete B3.

The implementation of sync mode introduces the concept of 'Alien' nodes which have been 'invaded' by another repository. In general, Alien nodes cannot be deleted by the transfer service. There is an aspect trx:alien that tracks which repositories have invaded a node. In the screenshot above nodes A1 and B3 are marked as aliens since B3 is an 'invader' even though it is a local node.

With multiple repositories transferring content in a hub and spoke system you can end up with more complex scenarios.

Transfer Service Multi invasion.gif

In the example above B1 and B6 are local nodes. Howeber B6 is an invader since it is a child of a transferred node, C2, that has come from repository C. This example is also complicated by the fact that C2 has a node transferred from repository A. So node C2 is invaded by both repository A and repository B.

Sync mode pruning

If sync mode transfer determines that a folder should be deleted but can't delete the folder since it contains alien content then what happens?

The behaviour is that transfer service has to leave this folder in place but 'prune' all content that is 'from' the transferring system. The other content is left alone.

Transfer Service Prune.gif

So in the example above if the node A1 is transferred after A2 has been deleted then node A2 should be deleted and all the children (A4, A5, A6, A7, A8) cascade deleted. However the presence of alien node B10 means that A2 must remain since it is the parent of B10 whic must not be deleted. And the children A4, A5, A6, A7 and A8 be pruned.

Location of classes relevant to the Transfer Service

The classes and interfaces that comprise the public API to the Transfer Service are located in the org.alfresco.service.cmr.transfer package. The core of the implementation is in the org.alfresco.repo.transfer package and its sub-packages manifest, report, and script. Log levels can be adjusted for these packages if more or less log information is desired from the transfer mechanisms.

Location of spaces relevant to the Transfer Service

The transfer service stores files that control and monitor the operation of the transfer service in the Transfers' space in the Data Dictionary.

Transfer Target Groups

Contains the transfer target definitions that specify where transfers go to. There is a 'group' level below the Transfer target Groups folder which is/will be used for classifying different sets of transfer targets.

At the moment (3.4) there is only a single group called 'default group'. Add your transfer targets to the 'default group' either through the TransferService API or by creating a 'folder' using Alfresco Explorer or Alfresco Share. There is a rule defined on the transfer groups folder to specialize the type of any folder created within it.

Transfer Lock File

Temp

Space used during processing of a transfer.

Transfer Reports

On the client side of transfer, transfer reports are created as children of the transfer target. Use Alfresco Share or Alfresco Explorer to view them.

Inbound Transfer Records

Stores the transfer reports for transfers that have come into this system.

Outbound Transfer Records

Not used and removed from future versions of Alfresco.

Usage Examples

Creating a new Transfer Target


TransferTarget target = transferService.create('The Other Repo');
target.setEndpointProtocol('https');
target.setEndpointHost('other.repo.example.com');
target.setEndpointPath('/alfresco/service/api/transfer');
target.setUsername('remoteperson');
target.setPassword('password'.toCharArray());
transferService.saveTransferTarget(target);

Note that a transfer target must be committed into the repository before it can be used for a transfer.

Creating a new Transfer Target Through Alfresco Share or Alfresco Explorer

You can also create a transfer target through Alfresco Explorer or Alfresco Share. Simply create a folder in the Company_Home/Data_Dictionary/Transfer/Transfer Targets/Default Group. A rule will run to specialize the node type to trx:transferTarget. The new node contains the properties you can fill in through the user interface to set up your target.

Building a set of nodes to transfer


//This example walks a tree of nodes starting at a given root node (assumed to be known already). It traverses
//only associations of type 'cm:contains' (therefore, presumably, the root node is of type cm:folder (or subtype))

NodeCrawler crawler = nodeCrawlerFactory.getNodeCrawler();
crawler.setNodeFinders(new ChildAssociatedNodeFinder(ContentModel.ASSOC_CONTAINS));
Set<NodeRef> nodesInTree = crawler.crawl(rootNode);

Transferring a set of nodes synchronously


//This snippet uses the target name and set of nodes used in the previous examples.
TransferDefinition transferDef = new TransferDefinition();
transferDef.setNodes(nodesInTree);
NodeRef transferReportNode = transferService.transfer('The Other Repo', transferDef);

Where transferred nodes are placed in the target repository

When a node is transferred, a package of information about it is sent from the source repository to the target repository. Among other things, that information includes:

the node ref (store ref + UUID) of the node
the node ref of the node's primary parent
the qualified path of the node's primary parent
the qualified name of the node's primary parent association
the qualified type of the node's primary parent association

This information is used by the transfer receiver in the target repository to work out where the transferred node should be placed and whether a 'corresponding node' already exists in that location. This is done in the following way:

if a node exists with the same node ref (store + UUID) then this is considered to be the corresponding node, and the transfer is handled as an update to that node
if a node exists with the same node ref as the transferred node's primary parent then this is considered to be the corresponding parent node
if the store of the transferred node's primary parent does not exist in the target repository then the transfer fails
if the corresponding parent node has not yet been found then try to resolve it by path in the store identified by the node's primary parent node ref
if the corresponding parent node has still not been found then the incoming node is currently an 'orphan' in the receiving repository - its corresponding parent node is mapped to a temporary location and the transfer is handled as a creation of the node
if a node exists in the target repository that is associated with the corresponding parent node with the same name as the transferred node is associated with its parent node, then this is considered to be the corresponding node, and the transfer is handled as an update to that node
if a corresponding node has still not been found then the following additional checks are made:
1. if the store of the transferred node is the archive store then an attempt is made to find its corresponding node in the store corresponding to its original store (by node ref and by path). If this successfully finds a corresponding node then the transfer is handled as a deletion of the corresponding node
2. if the store of the corresponding node is not the archive store then an attempt is made to find its corresponding node in the archive store of the receiving repository (by node ref only). If this successfully finds a corresponding node then the transfer is handled as a restore of the corresponding node (followed by an update of the restored node)

In the case where the inbound node is initially determined to be an 'orphan', this status is continuously checked during the course of that transfer. If its parent node appears later on in the same transfer then the orphan is re-parented. Note that orphans are not permitted to remain following a transfer. If an orphan's parent node does not appear during the same transfer then the transfer will fail.

Wish List

This section identifies features that aren't in the transfer service yet, but are known about as potential future enhancements. If you have suggestions then please do add them here.

Queue transfer requests on the source repo
Alfresco to file system transfer
Transfer model elements as necessary to support transferred nodes
Map ACLs between source repo and destination repo, so the ACLs can be different, for example it may make no sense for the editors to have the same rights to content on the live system.
Filter Aspects and Properties. For example allow discussions to be optionally filtered out of the transferred content.
Timeout a stagnant transfer to remove the lock on the target
Optionally split large transfers into smaller batches or be able to control the transaction boundary (So for example if you transfer 100000 nodes and it fails on the 99999 one then you have the option not fail the entire transaction.)
Track the create, update and delete of nodes to implement incremental time based replication.
UDP based out of band transfer of content.
Pull back content as well as push.