Hyland Connect

pmonks2 · ‎08-06-2009

Julian Wraith recently started a discussion entitled 'The future of content management' that has kicked off quite a few interesting responses.

Of those, the one that really grabbed my attention was Justin Cormack's great response entitled 'CMS technology choices'. By strange coincidence it closely echoes (but far more eloquently and in a lot more detail!) a conversation Kevin Cochrane and I had in twitter at about the same time, and while I almost entirely agree with everything Justin has written, the twitter conversation does highlight my one fundamental disagreement with the post. Here's the transcript of my side of that conversation:

Managing web content is about more than simply supporting the technical constructs the web uses (REST, stateless etc.).

eg. the graph of relationships between the content items making up a site can be an important source of information for authors.

But the web itself has no direct support for graph data structures (beyond humble 'pointers': <a href> tags and the like).

And perhaps as a consequence many (most?) Web CMSes don't have support for that either. 😉

IMNSHO the future is: schemaless (ala CouchDB, MongoDB, at al), graph based (ala Neo4J), distributed version control (ala Git).

(in hindsight I should also have mentioned 'queryable (ala RDBMS, MongoDB, etc.)')

To better describe my divergence from Justin's vision of the future, I believe that management of, and visibility into the 'content graph' (the set of links / relationships / associations / dependencies / call-them-what-you-will) is one of the more important features a CMS can provide, particularly for web content management where the link structure (including, but not limited to, the site's navigation model) is so integral to the consumer's final experience of the content.

So what 'content graph' features, specifically, should a hypothetical CMS provide?

In my opinion a CMS needs to support at least the following operations on the content graph:

Track all links between assets that are under management, in such a way that the content graph can be:
- bi-directionally traversed ie. the CMS can quickly and efficiently answer questions such as 'which assets does asset X refer to?', 'which assets refer to asset X?'
- used within queries ie. the CMS can quickly and efficiently answer questions such as 'show me all content items that are within 3 degrees of separation from asset X, are of type 'press release', and were published in the last month by 'Peter Monks''

Flag any content modifications that 'break' the content graph eg. deletion of an asset that is the target of one or more references
- From a usability perspective our hypothetical CMS would provide the ability for the user requesting the breaking change to automatically 'fix' the breakages eg. by correcting the soon-to-be invalid (dangling) links in the source item(s)

Support arbitrary metadata on references, preferably using the same metadata modeling language that is used for 'real' content assets

Support basic validity checking of external links - links that point to assets that are not under management (eg. URIs that point to other web sites)

Other than linking, I think Justin's post pretty much nails it. I'm a big fan of schemaless repositories, having worked extensively with several 'schemaed' CMSes that made seemingly simple steps (such as adding or removing a single property from a content type that happened to have instances in existence) a lengthy exercise in frustration.

I'm also a big fan of 'structural' versioning (ala SVN, Git, Mercurial etc.), as it's the only way to properly support rollback in the presence of deletions. Trying to explain to an irate user that they just deleted not only an asset but also its entire revision history is not something I particularly relish!

Rich query and search facilities are a given - it's one thing to put content into a CMS, but if you can't query and search that content, it's little better than a filesystem.

Replication (as in CouchDB, Git, etc.) is also an inevitable requirement for CMSes - I regularly see requirements for a CMS that can provide efficient access to documents across locations that are widely geographically distributed (including cases where connectivity to some of those locations is low bandwidth and/or intermittent). Replication (with automatic conflict detection and sophisticated features to assist with the inevitably manual process of conflict resolution) is the only mechanism I'm aware of that can handle these cases.

And in closing, a big thank you to Julian Wraith for initiating this discussion - it's extremely refreshing to discover other folks who are as passionate and (if I may say) as opinionated about CMS technology as I am!

Hyland Connect

The Future of CMS Technologies