Hyland Connect

Nat_Mara · ‎04-30-2014

We just recently upgraded to 13 SP2 and we also have IDOL. The new MRG states that the OCR format cannot have "Do Not OCR PDF docments" checked and that the Output format should not be changed from ASCII Text. I know that is if you are running OCR on the server, and using Data Capture with pagination enabled. We are not currently enabling Pagination unitl we are done testing so we installed OCR for the client on the server. My question is this, can we modify OCR format to set output to PDF image with searchable text until we enable pagination?

thanks in advance,

Nat

Tommy_Hearns · ‎04-30-2014

Hi Nat,

Although Full Page OCR Server and IDOL use the same OCR engine, based in the Data Capture Server, the OCR processing is separate. IDOL no longer uses the Full Page OCR product/license to create OCR text renditions of the image or PDF files as of OnBase 12 SP 1. OCR is now contained within IDOL Full Text Search through the Data Capture Server, so modifying the Output file format of the OCR Format has no effect on IDOL Full Text Search as IDOL Full Text Search never outputs a text file to disk for storage. The Output file format is only for creating stored text renditions for Full Page OCR.

To be clear, Full Page OCR/Batch OCR is NOT required or used by IDOL Full Text for OnBase 12 SP1 or later. OCR is now "built-in" IDOL Full Text.

Also, the Client OCR engine is a 32-bit OCR engine that is exclusive to the Client and used for the older Batch OCR, Advanced Capture, and Automated Redaction products. The Data Capture Server uses a 64-bit OCR engine, which is why it requires a separate installer. It is the same OCR engine, just 64-bit capable to allow for processing larger images. IDOL only uses the 64-bit OCR Engine through the Data Capture Server.

I would like to make you aware that the 18.62 OCR engine is now available that contains minor fixes from our OCR engine provider. You should be able to obtain it from your first line of support.

Thanks,

Tommy

Nat_Mara · ‎04-30-2014

Thanks a lot for the clarification Tommy. I was not clear on my question. I understand IDOL will not require OCR once Data Capture and pagination are enabled but that is not the case currently. We need to some testing before we enable pagination. I was told that if pagination is not being used, we do not need to have DataCapture Service running. In the meantime, when users scan in TIF images, they need to be be added to IDOL catalogues. I tested migrating 60 documents into a new collection and only 9 came over. None of the TIF images came over, even though they were set for mutliple renditons. I then installed OCR for Client on the server and manually ran the OCR process and succesfully migrated the rest of the documents. I noticed when viewing at the OCR rendtion, it shows as a text report format. I would prefer if the documents need to be OCRd in the short term for them to be PDF searchable instead of text report format Once we enable pagination, DCS, etc I can change OCR forrmat back, if necessary. Will that cause any possible issues down the line?

thanks in advance,

Nat

Tommy_Hearns · ‎04-30-2014

Are you still making use of the Batch OCR product in the Client to create OCR renditions in testing? Batch OCR (now called Full Page OCR) was ported from the Client to the Data Capture Server to take advantage of the 64-bit OCR engine in OnBase 13, although it is represented by a different license than Batch OCR due to an updated licensing model. The new license is called Full Page OCR for the Data Capture Server version of that product. The older Batch OCR still exists in OnBase 13 for legacy purposes.

Tommy_Hearns · ‎04-30-2014

I'll add that if you change the Output file format for the OCR Format assigned to that Document Type(s), that has no bearing on the OCR Format that will be used with IDOL and pagination. Dedicated OCR Formats exist for IDOL Full Text when pagination is enabled that are labeled Full Text (View) <Default> and Full Text (Index) <Default> that the Data Capture Server will use. During pagination for images and PDFs, those OCR Formats will be used regardless of the OCR Format assigned to the Document Type since those are special cases of OCR, and this will NOT result in a text rendition being stored for the document. All OCR processing for IDOL pagination and highlighting is done in memory, which eliminates the need for Full Page OCR.

So to answer your question, setting the Output Format to PDF Searchable Text in the interim will have no effect for pagination later. Although since there is indexing work to be done for pagination when a document is brought into OnBase, any documents imported into OnBase prior to pagination will need to be re-indexed by IDOL to enable the image/PDF highlighting for those images. An option exists in the IDOL Config tool for forcing the indexing process to execute on documents that already exist in OnBase prior to pagination being enabled. It may result in some redundant data being stored in the IDOL database, but hit highlights can then work for IDOL Full Text Search for older documents.

Hyland Connect

OCR 18.5