03-16-2012 10:47 AM
Based on the nuxeo-platform-importer package we have developed our own importer tool.
Up to one million documents the average import speed is about 250 docs/sec. Importing further documents takes more and more time and the average speed goes down to 50 docs/sec and less.
/>
/>We followed all performance-relevant instructions for postgres DB described here. My question is if anyone knows further measures to speed up import since we had to import several millions documents.
03-16-2012 11:22 AM
Hi
Yes importing few millions of documents is a question of days.
A much faster way is to generate the ad'hoc SQL dump and to populate the database with the PostgreSQL copy instruction. This is possible if the data layout to import is simple.
ben
03-16-2012 06:14 PM
The big thing I found to help with mass import tuning was batch size - that is number of documents created before a commit (save) is performed. Too small and the overhead is large (per transaction). Too big and Postgres complains (not to mention a hickup runs the risk of losing all the documents in the commit).
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.