Hyland Connect

docderivative · ‎03-04-2014

I have returned to a project that I initiated many years ago and found the implementation is not necessarily what I expected! In particular custom code has been written to distribute the content and its associated meta data across numerous stores. This additional code has significantly increased the complexity of the implementation (and caused a few issues since updates to documents have ended up spanning stores due to implementation errors). The justification for this approach was that Alfresco (this is 3.x) 'did not work' with many documents per store and a limit of 1000000 documents was imposed to 'stop it running out of memory'.

The implementation is extremely problematic… specifically finding documents requires queries that potentially span all stores (since the ultimate document location depends on its update history).

As we move to Alfresco 4.x I am considering simplifying the approach and merging everything back into a single store….

We have 50+ million documents and expect to grow to 100+ million within 5 years…. will it fit in one store?

(PS Content is not indexed… only approx 16 attributes per document and they are all simple strings, dates or numbers).

Thanks in advance
DrD

mrogers · ‎03-04-2014

Is there a particular JIRA reference to this "running out of memory" bug? Certainly its not unheard of for a million docs being used as a fairly modest test set these days.

If you are an alfresco customer it may be worth talking to alfresco support who will have advice and experience with working with larger volumes of data. But in general it makes sense to partition your data along business lines if that's possible.

lw7415 · ‎03-06-2014

In our experience we found that the amount of memory needed for SOLR is what really needs to be paid attention to when looking at installations with a large amount of nodes. We implemented a records management system with somewhere around 300,000 nodes and had to add lots of RAM to optimize the Alfresco system. We implemented this under version 3.4 but are now at 4.1.

The official Alfresco product documentation has a calculator to assist in estimating the amount of memory needed for SOLR.

We are looking to separate SOLR from the repository (i.e. run them on separate CPUs) in the future as we keep adding more nodes to our repository.

Hope this helps,

Lw

docderivative · ‎03-07-2014

Interesting. We have approx 15 stores with 1M nodes in each. Searching across stores is a pain since results need to be aggregated but maybe the 1M limit wasn't that bad an idea anyway. Might have to have a think about the partitioning.

docderivative · ‎03-07-2014

If 1M is a test set-up…. how large are the stores actually successfully being used by customers?

Partitioning is tricky; we already do some along business lines (and chronological order)

mrogers · ‎03-07-2014

Perhaps other forum users could chime in with their figures…

Hyland Connect

Max Nodes in a store?