cancel
Showing results for 
Search instead for 
Did you mean: 

Massive (1000 000 docs) import of documents without ACLs with Nuxeo 5.6 and REST

jbouche_
Champ in-the-making
Champ in-the-making

Environment

  • OS: Windows server 2008R2 service pack 1
  • Java : JDK 1.7 64 bits
  • Nuxeo server: heap memory = 999 Mo
  • Nuxeo base
    • contents : 100 000 documents
    • search full index disabled
    • document created without ACLs

Scenario

a) After creation of 20 000 more documents

  • there is '2014-02-07 15:36:59,241 WARN [org.nuxeo.ecm.core.event.tx.PostCommitSynchronousRunner] PostCommitListeners are too slow'

  • the speed of the document creation is 12 100 docs/hour while it was 25 000 at the beginning.

b) Then after creation of 40 000 more documents

  • the speed of the document creation is 3500 docs/hour
  • there is a "java heap space out of memory"

Questions

  1. How to avoid the message 'PostCommitListeners are too slow' and keep a speed at least 20 000 docs/hour to import 1 000 000 docs in a reasonable time ?

  2. How to improve the speed of the document creation knowing that our software can supply Nuxeo with 500 000 docs/hour ?

  3. How to avoid "java heap space out of memory" ?

Why Nuxeo uses so much memory while

  • we only create documents one by one
  • and search if document exists before creation ?
1 REPLY 1

bruce_Grant
Elite Collaborator
Elite Collaborator

You might want to take a look at Nuxeo's bulk importer to achieve your desired result. Or write your own custom importer for even more control over the import process, transactions, etc.

Getting started

Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.