Hyland

Mike_Kroner1 · ‎03-04-2022

Hello

I have a dip file that is 200MB is size with 2 million rows. Has anyone ever split the index file into smaller ones to make the DIP process more manageable? If so, how did you do handle this process?

Thanks,
Mike

Ryan_Wakefield · ‎03-04-2022

The easiest way to accomplish this that I found was to do a powershell script. If you need some help with this, I don't mind doing so.

Eric_Beavers · ‎03-04-2022

While DIP can handle a file that size, I have had to split up work before. It all depends on your project constraints.

Many years ago I was working on a FileNet Backfile conversion, migrating more than 8 million records. This project was further complicated because we also had an Epic EMR integration that needed updated with these new records via HL7 Importer. Finally, I was not allowed to import during business hours (6am-6pm).

We started in a test environment, building out our process. Then we iteratively tested growing batches. EX: 100, 1000, 10000. Along the way we found small issues that had to be fixed (which is way easier with 100 docs).

Based on the sample processing speeds discovered, I ended up setting up 3 dedicated DIP Servers to import my batches in parallel. I found 100-500k document sized batches worked well based on my test environment testing.

Initially, I split the text file using an advanced Text Editor called Notepad++.

Eventually, I batch scripted most of the process to automate as much as possible.

At first, during the available night hours I was getting about 1.3Million imported into Production. At the start of the next day the customer's Database Admin team requested we reduce the amount as they could not keep up with the increased work related to Database Backups and Maintenance tasks. So we tried only 500k/night (it was either 2x 250k batches or 5x100k Batches) on the following night shift. The next day the DBAs reported the workload was now manageable. It took about 12 days with these new limits in place.

Michelle_Troxel · ‎03-04-2022

Generally I won't run a DIP file with more than 5000 rows. I have had issues with disk group locks both when processing the DIP file and when committing the batches, even when the doc type was on a different disk group. As Ryan said, you can run a powershell script to break the files up. That's how I did it as well. Then I load the .csv files one at a time, run the DIP, commit the batch on a different computer, load the next file, etc. It takes about 30-45 minutes for me to run a 5000 row file on the weekends. Tedious, fun Saturday night movie activity. I'm sure there is a way to automate it all but I didn't have that many files.

When we first went live and did a conversion from the legacy system, the batches were 50,000 files. That's proof you can do more but you may take a performance hit. If you're not 24/7 and can do them on off hours, you probably can get away with larger files.

Michelle

Eric_Beavers · ‎03-04-2022

@Michelle Troxell did you use any Process Tuning Parameters? I know I have had to use these for High Volume imports.

The documentation in the DIP MRG is pretty good at explaining the settings. There is also a really old blog post from 2006 Is there a way to tweak the performance of DIP? (hyland.com)

Hyland

DIP File