cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco + EMC Centera

munwar
Champ in-the-making
Champ in-the-making
Earlier there were few requests to extend Alfresco's storage support to CAS systems like EMC Centera. Not sure whether any one has tried this in the past.

We are looking at storing millions of documents in EMC Centera and store the meta-data in Alfresco for security, search and retrieval purposes.

Has any one integrated EMC Centera storage with Alfresco?

Has any one tried mounting EMC Centera storage as File System Storage for Alfresco CMS.  There are various third party products available in the market which provides such solution.  Few products are:

Gateway products such as “Storage Switch” (Centera FS Gateway - http://www.storageswitch.com/public/pdf/CenteraGW.pdf)
Storage Switch is a robust gateway for EMC’s Centera. Using the product you can take advantage of standard file system protocols such as NFS, CIFS, HTTP, FTP to store, retrieve and manage your data.

Another such solution is CAStor FSG (from Caringo - http://www.caringo.com/pdfs/CAStorFSG_datasheet_080131.pdf), Applications that use Common Internet File System (CIFS) or Network File System (NFS) protocols can leverage the advantages of CAStor without application or business process modifications.

What is the best approach?  Connecting to Centera through APIs directly from Alfresco or using 3rd party products like above to mount?

Any suggestions?
11 REPLIES 11

paulrmc
Champ in-the-making
Champ in-the-making
munwar,

I'm happy to report that an open source, fully functional native integration between Alfresco CMS and Caringo's CAStor is available (free of charge, of course) from Caringo, upon simple request. This integration does not require the use of our optional FSG product. As a matter of fact, when using this integration, the Alfresco CIFS interface serves as a standard mountable volume that lets users store file content in CAStor, much like FSG does.

This high-performance, native, open source connector was originally developed by one of our customers - a large multinational pharmaceutical company - to allow Alfresco to manage huge medical imaging files stored in replicated CAStor clusters on both sides of the Atlantic. The solution works like a charm and because the content is stored in CAStor, it lets you work with those monster files as easily as you would with run of the mill MS Word documents.

Please feel free to contact me for more info (or for the connector code 😉 )

Paul Carpentier
Cofounder and CTO
Caringo, Inc.
( paul dot carpentier at caringo dot com )

munwar
Champ in-the-making
Champ in-the-making
Paul,

Thanks for the information. Your solution is very useful for the customer who are building highly scalable solutions using Alfresco.  I request you to provide documentation on Alfresco's wiki (one pager) to include all the necessary pointers.  I will also contact you to request your expert advise.

What would be the best solution if the customer has already invested heavily on Centera?

lateral
Champ in-the-making
Champ in-the-making
Hi,
We are in discussions currently with EMC to build an Alfresco/Centera integration for one of our clients that supports both the Centera API and the forthcoming XAM standards initiative. It is early days yet but we anticipate prototyping this in the next couple of months, subject to customer funding approval. EMC are currently investigating whether it would be possible to "Open Source" such a thing if it connects through to their APIs…interestingly enough…they are keen to do so if they can from a licensing/legal perspective.

regards,
Alex Lee
CEO Lateral Minds
alex@lateralminds.com.au

paolomoz
Champ in-the-making
Champ in-the-making
Hi Alex,
as we are evaluating existing and/or implementable approaches towards an Alfresco/Centera integration, I would be interested in knowing any update about the integration you mentioned.
Did you implement any prototype for your customer?
Did you go further in building such an integration?
Is there any Open Source piece of code around about this?

Hope this is still of your interest,


Paolo Mottadelli - Sourcesense
p.mottadelli@sourcesense.com

thadguidry
Champ in-the-making
Champ in-the-making
If using a distributed file system or cluster is what your looking for that can scale to Petabyte (s) then perhaps a Lustre system might be of use.  Version 2.0+ will support ZFS file system in 2010 and looks to be promising.  As it is now, even Lawrence Livermore and NASA seem to like it.

Speaking of ZFS, you may also look into open source Nexenta which combines GNU C/Linux/Open Solaris and ZFS.

Remember that a file system that will be running a database should be tuned appropriately if scaling is a concern.  As the case with Alfresco & MySQL, you'd want to tune ZFS with the same block size as the database would use.  Hardware Raid solutions just don't give you that possibility that ZFS can.  Here's a few articles that will help you learn more about ZFS and MySQL tuning.

http://forge.mysql.com/wiki/MySQL_and_ZFS  <– Video Presentation
http://dev.mysql.com/tech-resources/interviews/neelakanth-nadgir.html - Interview with Neelakanth Nadgir

Regards,
Thad Guidry
DAI SYSTEMS LLC
214-556-8040

landgar
Champ in-the-making
Champ in-the-making
Hi!
I´m also integrating Alfresco with Centera. I like to know what are the licensing conditions at using Alfresco, being a partner of EMC.
Concerning to the integration, is integrating Centera with XAM much more difficult than using Centera API? Will it work it, if we only use Centeras?
Has anybody else tried integrating Centera API or XAM in Alfresco?

Hi,
We are in discussions currently with EMC to build an Alfresco/Centera integration for one of our clients that supports both the Centera API and the forthcoming XAM standards initiative. It is early days yet but we anticipate prototyping this in the next couple of months, subject to customer funding approval. EMC are currently investigating whether it would be possible to "Open Source" such a thing if it connects through to their APIs…interestingly enough…they are keen to do so if they can from a licensing/legal perspective.

regards,
Alex Lee
CEO Lateral Minds
alex@lateralminds.com.au

jawz
Champ in-the-making
Champ in-the-making
Our company is relatively new to Alfresco, but have determined that it makes sense to use as an Enterprise Content Management System for much of the content that flows through our environment.  I've come across this thread, which seems closely related to our question as we use EMC for our storage (EMC NSX).

Our struggle is in defining an enterprise implementation model that supports high volume document access from mission critical Java processes running on Unix boxes.  Concerns about making API calls to get hundreds or thousands of documents, real-time, are based around the concerns of availability as well as performance.  Because of Alfresco's proprietary nature for storing/indexing content, the client will either have to make API calls, or mount via NFS.  Our Systems Administrations teams have raised red flags (show-stoppers), indicating that the nature of how Alfresco implements NFS introduces risk to any machine that is mounted to Alfresco, should that service go down.  i.e. the Alfresco NFS mount fails or goes down, and any connected machine will need to be restarted.

As we architect the way in which Alfresco will be leveraged in our infrastructure, there is a significant dependency on the ability to access content, meaning the content has to be highly available, and the performance has to be optimal.

The questions we have for the group are:
1) How are other folks doing this?  Companies must have similar requirements to us, where access to the content must always be on, and the latency of repetitive API calls, may not be acceptable
2) Regarding NFS, are the concerns of our SysAdmins valid?  If so, what are others doing to mitigate this risk?

At the end of the day, it comes down to ensuring the content is available, and preferably as if the content were locally resident to the process that consumes it, as would appear to be the case with NFS.  Hundreds-to-thousands of API calls to return content doesn't feel like an appropriate solution, nor does a few API calls, with significantly large responses???

We are very interested in hearing some responses from individuals/organizations who are leveraging Alfresco similarly to how we intend to, and understanding some best practices.

mrogers
Star Contributor
Star Contributor
Could you clarify what the concern is with Nfs.   Is it accessing alfresco via its Nfs server or alfresco's use of Nfs.

There are various caching strategies that can be used to reduce latency.  And Alfresco nodes can be clustered for high availability.

jawz
Champ in-the-making
Champ in-the-making
Our SysAdmins concerns are that the NFS implementation is, for a simplistic definition, Alfresco specific.  As I understand it, Alfresco implements NFS through JLAN and any time I wish to get a file that is shown to be in a directory, the file must be retrieved through the Alfresco implementation.  

So if a file is shown to be at "Company Home > Company A > ProductionFiles" in Alfresco Explorer, I can't simply go to that location on disk and get all the files in "ProductionFiles."  Instead, I must make a call to Alfresco's NFS and request Alfresco to give me the files in the "ProductionFiles" folder.  Is this a correct understanding?

Various concerns, then, are that if we have hundreds-of-thousands of files that need to be used in our production run, we would essentially be making hundreds-of-thousands of API calls to alfresco which could bog down the system and be highly inefficient, and could seriously delay they time it takes to process the production runs.  

Another issue, is if Alfresco is running on ServerA, and I have an NFS mount from ServerB pointing to Alfresco on ServerA, that if there is any issue with the Alfresco Server or Alfresco's NFS mount, ServerB would bog down in repetitive calls in an attempt to re-establish the mount (whether the Alfresco server itself is down or bogged down by numerous API calls).  The preferred method, instead, is point to a location on the storage system (EMC) with proven connections and redundancy built in.

Furthermore, in the 3.4.x versions of Alfresco, NFS is not a supported feature of Alfresco in a clustered environment.  I see it may have become available via Hazlecast in the most recent version of Alfresco (currently 4.0.2) but we are not yet ready to upgrade to the 4.x series.

So back to the original questions:
1) How are other folks doing this? Companies must have similar requirements to us, where access to the content must always be on, and the latency of repetitive API calls, may not be acceptable.
2) Regarding NFS, are the concerns of our SysAdmins valid? If so, what are others doing to mitigate this risk?