Hyland Connect

andreasevers · ‎12-17-2011

Hey everyone,

I'm looking into possible solutions for an offering regarding document management.
I'd like to suggest Alfresco to be used as underlying framework, but I'm not too sure about the current state. I've used Alfresco in the past and was very enthusiastic about it, but that was only version 1.x

It would help me a lot if I could wrap my mind around certain aspects of Alfresco that are important for the customer.

The pointers I need to find out are:

- How can I upload documents from Java? Using REST/SOAP/etc. ?
- How and where are documents archived? What technology is being used for this?
- Is there an automatic cleanup of "older" documents? (scheduling service or something)
- What are the possibilities regarding security? How are access rights managed and so on?
- How is clustering and scalability tackled in Alfresco? Can we run multiple instances etc?
- How does the version control mechanism work and what's the underlying technology?

Of course I realize this is a lot to ask, but solely some references to wiki's or similar information would already suffice.

Thanks a lot in advice

Andreas

mrogers · ‎12-18-2011

- How can I upload documents from Java? Using REST/SOAP/etc. ?
Yes.   Lots of options.
- How and where are documents archived? What technology is being used for this?
-Depends upon what you mean by "archived".   For example there's DOD5015 support and XAM support.    Or the facilities to "do your own" archiving.

- Is there an automatic cleanup of "older" documents? (scheduling service or something)
No, but there's workflow

- What are the possibilities regarding security? How are access rights managed and so on?
There's a customisable authorities and permissions model.
- How is clustering and scalability tackled in Alfresco? Can we run multiple instances etc?
Yes.     There's various patterns for clustering Alfresco.    The main one is to cluster together several instance of alfresco behind a load balancer.
- How does the version control mechanism work and what's the underlying technology?
Alfresco implements its own "version control".

andreasevers · ‎12-18-2011

Thanks mrogers,

I compiled the following paper (might be handy for anyone else who is interested). Can you check if there are any errors inside? Thanks a lot!

[size=150]Uploading documents to the Alfresco Repository programmatically[/size]
Alfresco offers the possibility to upload files to its repository using Java in multiple ways.
By REST
The “UploadContentServlet” is responsible for streaming content directly from servers into the repository using the HTTP PUT command. It’s also possible to implement the REST call manually, or you can use one of the numerous Web Script implementations Alfresco provides out of the box.
By SOAP
The second option is using webservices. Alfresco includes the “ContentServiceSoapBindingStub” which can be used when performing a SOAP call. (axis implementation)
Of course you can still use any of the other services Alfresco offers, such as drag-and-drop file transfer with CIFS mounted drives. The Alfresco content application server supports many folder and document-based protocols to access and manage content held within the content repository using familiar client tools. All the protocol bindings expose folders and documents held in the Alfresco content repository. This means a client tool accessing the repository using the protocol can navigate through folders, examine properties, and read content. Most protocols also permit updates, allowing a client tool to modify the folder structure, create and update documents, and write content. Some protocols also allow interaction with capabilities such as version histories, search, and tasks.
Supported protocols include:
CIFS (Common Internet File System)
CIFS allows the projection of Alfresco as a native shared file drive. Any client that can read and write to file drives can read and write to Alfresco, allowing the commonly used shared file drive to be replaced with an ECM system without users even knowing.
WebDAV (Web-based Distributed Authoring and Versioning)
WebDAV provides a set of extensions to HTTP for managing files collaboratively on web servers. It has strong support for authoring scenarios such as locking, metadata, and versioning. Many content production tools, such as the Microsoft Office suite, support WebDAV. Additionally, there are tools for mounting a WebDAV server as a network drive.
FTP (File Transfer Protocol)
FTP is a standard network protocol for exchanging and manipulating files over a network. This protocol is particularly useful for bulk loading folders and files into the Alfresco content repository.
IMAP (Internet Message Access Protocol)
IMAP is a prevalent standard for allowing email access on a remote mail server. Alfresco presents itself as a mail server, allowing clients such as Microsoft Outlook, AppleMail, and Thunderbird to connect to and interact with folders and files held within the Alfresco content repository. IMAP supports three modes of operation:
1.   Archive: allows email storage in the Alfresco content repository by using drag/drop and copy/paste from the IMAP client
2.   Virtual: folders and files held in the Alfresco content repository are exposed as emails within the IMAP client with the ability to view metadata and trigger actions using links embedded in the email body
3.   Mixed: a combination of the above
Microsoft SharePoint Protocol
Microsoft SharePoint protocol enables Alfresco to act as a SharePoint server, creating tight integration with the Microsoft Office suite. This allows a user who is familiar with the Microsoft task pane to view and act upon documents held within the Alfresco content repository. The collaborative features of Microsoft SharePoint, such as Shared Workspace, are all mapped to Alfresco Share site capabilities.

[size=150]Retention and archival policies[/size]
Disposition schedules are a key function of the records management system. The disposition schedule defines the procedures required for maintaining records in the records management system until their eventual destruction or transfer to another location.
A disposition schedule contains one or more steps that define a particular function to be carried out at a date or after an event has occurred. An example disposition schedule is:
Cutoff 30 days after filing, transfer to offline storage two years after cutoff and destroy seven years after transfer.
It’s possible to store the documents in an archive, a repository for the long-term storage and control of information that must be retained for operational or regulatory reasons.
Typical uses for an archive are to store reports, scanned documents, or electronic documents that are no longer used, but which the organization wishes to retain for possible future use. Records are a special case of Archive in which the documents and content stored in the archive are managed according to official rules of retention, lifecycle, and/or review process.
Such an archive is usually a section of an Alfresco repository that has a separate folder/space structure specifically for the documents stored in the archive. An archive will specify permission controls on who can add, modify, or delete documents. An archive also provides lifecycles on change of state and location.
Alfresco also provides (among others) DOD5015 support for applying legal regulations and XAM support by exposing connector modules.

[size=150]Automatic cleanup of older documents[/size]
Alfresco provides a workflow engine to help automate the processing of documents. Based on jBPM, Alfresco workflows can be built to support simple review and approval processes or can be configured to support more complex business processes such as automatically moving or deleting documents after a certain amount of time. They are easy to configure and very extendible.

[size=150]Security policies and Access Rights Management[/size]
Alfresco security comprises a combination of authentication and authorization.
Authentication is about validating that a user or principal is who or what they claim to be. Alfresco normally refers to users. A user’s credentials can take many forms and can be validated in a number ways. For example, a password validated against an LDAP directory, or a Kerberos ticket validated against a Microsoft Active Directory Server.
Alfresco includes:
•   An internal, password-based, authentication implementation
•   Support to integrate with many external authentication environments
•   The option to write your own authentication integration and to use several of these options simultaneously
Alfresco can integrate with LDAP, Microsoft Active Directory Server, the Java Authentication and Authorization Service (JASS), Kerberos, and NTLM. A user ID can also be presented as an HTML attribute over HTTPS to integrate with web-based single-sign-on solutions.
Authorization determines what operations an authenticated user is allowed to perform. There are many authorization models. Popular ones include: Role Based Access Control (RBAC), UNIX-style Access Control Lists (ACLs) and extended ACLs, Windows-style ACLs, and many more. Authorization requirements for the management of records are more detailed and include additional requirements, for example, enforcing access based on security clearance or record state.
Alfresco authorization is based on UNIX-extended ACLs. Each node in the repository has an ACL that is used to assign permissions to users and groups. Operations, such as creating a new node, describe what permissions are required to carry out the operation. ACLs are then used to determine if a given user may execute the operation based on the permissions that have been assigned directly to the user or indirectly through a group. An operation in Alfresco is invoking a method on a public service bean. For example, creating a user’s home folder requires invoking methods on several public services; to create the folder, set permissions, disable permission inheritance, and so on. Each public service method invocation will check that the user is allowed to execute the method.
Permissions are given to a user or a group of users in the administrator UI.

[size=150]Clustering and Scalability[/size]
High Availability (HA) clusters are implemented in Alfresco to improve the availability of services and to improve performance of these services. Availability is enhanced through redundant nodes that provide services when other nodes fail. When integrated with a load balancer, performance is enhanced by distributing, or balancing, server workload across a collection of nodes.
A cluster represents a collection of nodes. For example, a set up of two tomcat nodes on two separate machines, talking to shared content store, shared database, but each with their own indexes. This is the simplest cluster to set up, gives redundancy due to the two machines, and can load-balance for performance or use the second node as a "hot spare" for fail over.
To provide a flexible cluster discovery process, JGroups is integrated into the repository. JGroups is a toolkit for multicast communication between servers. It allows inter-server communication using a highly configurable transport stack, which includes UDP and TCP protocols. Additionally, JGroups manages the underlying communication channels, and cluster entry and exit.

[size=150]Version Control[/size]
The Alfresco repository currently supports two store implementations. The “core”, or Document Management based store, and the Alternative Versioning Model (AVM). The AVM is an alternative store implementation designed to support the version control requirements of managing websites and web applications. The “core” implementation applies to documents and content entered into the Alfresco repository.
By default, content that is created in the repository is not versionable. When creating content, users must specify versionable on a case-by-case basis.
When content is versionable, the version history is started. The first version of the content is the content that exists at the time of versioning. If you want all content to be versionable at the instant of creation, you can modify the definition of that content type in the data dictionary. The definition must include the mandatory aspect versionable.
By default, all versionable content has auto-version on. As a result, when content is updated, the version number is updated. The auto-version capability can be turned off on a content-by-content basis in the user interface. If you want auto-versioning to be off for all content, you can modify the definition of that content type in the data dictionary.

Hyland Connect

Alfresco technical features questions