Hyland Connect

bnordgren · ‎04-08-2010

I'm responsible for technical planning and implementation of unexpected web needs. My environment is a research lab where the individual scientists may at any time propose to develop a model which has a web delivery component. These may be anything from static files containing prose to a full web app written in Language X. Our goals are not to develop a giant monolithic system, but to support rapid prototyping using the most appropriate tools to the task at hand.

To that end, I like Alfresco's WCM for the following reasons:

Openness: Alfresco can deploy to a filesystem target, allowing a totally decoupled system to manipulate, interpret, and/or execute the managed code.

Centralization: Alfresco centralizes a reference copy of the code/files, centralizes the records of which versions were deployed where, and offers one central place to control the distribution of code to a handful of well defined target machines.

Workflow: Alfresco supports the imposition of workflows on the update of websites.

Delegation: Alfresco supports the delegation of responsibility for web projects to a web project owner, so my fingers aren't in everyone's pies.

Security: Alfresco supports secure methods of deploying content to our public website outside the firewall. I am not required to hand out shell access to our server in order to allow people to control their little web space.

Clearly, the things I like about Alfresco WCM have nothing to do with content management and everything to do with configuration management. Now the "bad news": the one thing I really don't like about Alfresco WCM is the assumption that all changes must happen within Alfresco. There is no means to change files in a deployment target and commit those changes back to the repository. (Bear with me, I'm not trying to make Alfresco WCM into subversion; I want to keep all the good features listed above, which are nearly diametrically opposed to Subversion and other SCMs.) I would actually prefer that the authoring server be forced to query a deployment target for changes.

Reasons to want to update a web project with changes in a deployment target all have to do with being able to rapidly redeploy a web application exactly as it was at the time of a catastrophic event:

Running the install script in an external system (e.g., Joomla!) changes the file set, and this change needs to be recorded in the web project.

Installing an "extension", "module", "plugin", or whatever else in the external system changes the fileset, and the change needs to be brought into the web project.

The deployed web application supports user-generated content which affect the file set (uploading files/attachments).

I know what you're thinking, but you cannot solve this problem by running the web app off of a CIFS mounted filesystem if your public webserver is outside the firewall and your authoring server is inside. You could, however, create a "deployment target sandbox" on the authoring server which periodically queries the deployment target for updates. The deployment target, of course, needs to support sending "differences" since the last query. Note carefully that all queries originate from within an organizational firewall and connect to an external endpoint.

Now on to my next observation: many of the apps I run, including Alfresco itself, consists of more than files. There is an associated database for Alfresco, Joomla!, Trac, Redmine, and the "custom special purpose" apps. Preserving the state of the web application consists of a common pattern of preserving the files and preserving the associated database. In fact, the database is where the content typically lives. It is not necessary to interpret the contents of the associated database; merely necessary to preserve it. Thus, web projects could contain the entire state of the web application if they had an optional "database dumpfile" property/field/association which was not part of the files deployed to the website. If the deployment target could execute simple preconfigured commands (possibly even defined in a property file) on the target system, the authoring server could ask it for a current database dump during an update, or could instruct it to deploy the given database dump to the database during a deployment.

So to summarize:

Alfresco's current WCM infrastructure works well when the flow of information is always authoring server to deployment target.

For web applications which run in an external environment, there are many valid reasons information may flow from deployment target to authoring server. It would be nice to have the authoring server query a deployment target for changes in its file set.

Modern web apps (including Alfresco Explorer/Share/etc) are almost invariably comprised of a set of files (typically the web application) and an associated database. It would be nice to have a way to associate a database dumpfile with a web project.

Last but not least, I think there would be great utility in making the deployment target capable of using database command line tools to dump and restore the database associated with its fileset.

I think this functionality should be named WAM for Web App Management.

Any comments?

mrogers · ‎04-08-2010

I hope some of my colleagues come in on this thread since they will be able to put forward slightly different perspectives.

Here are some points in response:
* User Generated Content - being able to pull back UGC from a runtime environment is something we know would be useful and discuss from time to time. Its one of the feature candidates for the next version of the transfer service in 3.4.   Let's see how priorities shape up.
* I agree that the typical pattern is for all web content to be assembled within alfresco and then deployed out to the web runtimes. It's a good strong pattern. But we do already have senarios where only part of the content is controlled via Alfresco.   And you can also have automatic information feeds into alfresco which is then deployed out. That's another strong pattern.
* In addition to flat files being deployed, Alfresco also allows deployment of content to support a web site.   This could be part of what you were getting at by talking about the "associated database".    With 3.3 we are expecting people to start using Alfresco as a CMIS runtime to support their dynamic websites.
* And finally deployment targets can already execute "predefined commands", so you can deploy a dumpfile to your your database already.

bnordgren · ‎04-08-2010

I had envisioned "deploying a dumpfile" to mean deploying a dumpfile that was associated with a snapshot in the authoring environment. Hence the predefined command is parameterized by a dumpfile which is not deployed to the filesystem target with the rest of the web app.

However, if the dumpfile were deployed, (and always deployed to a file of the same name), I can see adding the database restore as the postcommit hook.

Meanwhile, I spent the day trying to draw a system diagram (and assemble a toolset) which could accomplish my objectives using off the shelf tools. I think I found something that works. Briefly, the authoring server is in charge of versioning and deploying to test/live targets. Each web project in the authoring server assembles all of the aspects of the target web app into one place using a directory tree to keep things tidy: program files, user generated data/content, database dump, and configuration directories.

I have rsync pulling changes into a "web app mirror" on a nightly basis. The web app mirror has the same form as the web project. Rsync connections are always outbound thru the firewall, and use ssh as a remote shell.

Our web app mirror (on a local machine) is then mirrored to our data center in Kansas City using rsync.

Finally, to get the changes into the Alfresco web project, I've come up with rsync-ing from the web app mirror to a CIFS mounted sandbox. I think I'm going to have to create a dummy user called "mirror" for this, because I don't think Alfresco can use our corporate active directory for CIFS authentication.

Of course, one of the Apps I'm going to treat in this fashion is Alfresco itself, so each app is going to have a "current mirror" in our data center as well as a versioned history in the Alfresco app mirror. Call me paranoid.

I can draw a picture (two pics, actually: one with the authoring server inside the firewall and one with the authoring server outside) if there's any interest. I'm going to have to document this for our website technical people anyway.

In any case, having gone thru this mental exercise and locating an rsync for windows with both the rsync command line binaries and a gui front end, I think it may be best to take a lighthanded approach to gathering user generated data into a web project. By that I mean to delegate the change detection and transport security to external apps which already do that well, and have Alfresco push the buttons on those apps. (Much like the approach taken to connect to the openoffice server).

Sorry I've been so longwinded recently. I get that way when my head's in the middle of a puzzle.

bremmington · ‎04-09-2010

You're doing some interesting things

Your solution to the two-way sync seems perfectly reasonable. I assume that you're not expecting there to ever be conflicting changes on the authoring and live servers, otherwise it would become more complicated. It is this that would cause the greatest difficulty with regard to the "database dump" idea that you mention in your initial post. How would a two-way sync work in that case? Would the database on the live server be considered the "master"? This would seem to make most sense in the case of Redmine and Joomla!, but perhaps not for Alfresco. I'm not sure how managing and deploying database dump files with Alfresco would fit into most use cases - perhaps you would clarify your thoughts?

As Mark mentioned earlier, the question of UGC is one that is under consideration at Alfresco. It will be something that we will be addressing in the product at some point, but not in a timeframe that would help any current projects.

bnordgren · ‎04-16-2010

Sorry for the delay.

Was implementing a generic cross platform web app replicator using rsync and ssh in python. Still have a bit of work to do on it. Haven't closed the loop back to Alfresco WCM yet.

My thought on "conflicting changes" is that changes are likely to be separable. To illustrate, my "alfresco" web app directory contains:

app/ : Alfresco program files as they exist under the tomcat directory

db/ : Contains a tarball of the dump of the "alfresco" database.

content/ : All the contents of "alf_data"

config/ : Alfresco configuration files under ${catalina.base}/shared/classes

Clearly, one cannot use an Alfresco instance to store a mirror of itself without getting the infinite mirror effect. But an Alfresco WCM instance that is responsible for maintaining the Share deployment, that's doable.

Let's say I create a user (and sandbox) called "mirror" which receives all the changes from the live site. These changes should be localized to the "db" and "content" directories. Now let's say that I'm upgrading to Community 3.3. I use my own account (and sandbox) to update the app and config directories and deploy to a test server. Once I have a working merged copy of "current" db/content with "new" app/config, a final shutdown/update of data/deploy/restart ought to bring the new system online with a minimum of fuss (and a fair degree of certainty that I can back out my changes). The record of what I had to change to make it work is stored in the WCM with no additional effort on my part, and it is easy for future me (installing Community 3.4) to look back to see what I had to change last time.

In this case, I'd use WCM instead of Subversion because the vast bulk of changes will be to zipfiles (which means the entire file will be erased and replaced with every commit). Additionally, I am only interested in keeping backups for a week or so.

Of course, if there are two administrators changing the configs in conflicting ways, they can solve their conflict using the traditional cage match to the death ritual.

Hyland Connect

Musing about Alfresco WCM for web app management