07-06-2015 02:37 PM
I have a Unity Script that takes a text document in OnBase and writes it to a PDF using the PDFDataProvider. It appears that the resulting PDF is an image based PDF or at least not text searchable. It's also quite a bit larger than the original text file imported into OnBase. Can the PDFDataProvider write out a text based PDF? I can't seem to find a method in the SDK to do so.
My current code is:
Dim pdfProvider as PDFDataProvider = app.Core.Retrieval.PDF
Using pageData as PageData = pdfProvider.GetDocument(rendition)
Using stream as Stream = pageData.Stream
Utility.WriteStreamToFile(stream, path)
End Using
End Using
I didn't see anything in PDFGetDocumentProperties that seemed to do this.
07-07-2015 05:13 AM
Natively, the PDFDataProvider produces image-only PDF files when converting supported file types. There is no API setting to modify this.
As James mentioned, if you're licensed for full-page OCR (batch or ad-hoc), you can create a text-searchable PDF rendition that way but not directly through the API. You can, however, create a scan batch through the API and push the document into a scan queue configured for OCR in order to automate part of the process. Likewise, you could use workflow (with the "Queue Document for OCR" action) to place the document in the "Awaiting Ad Hoc OCR" queue)
07-07-2015 06:44 AM
I'm marking Scott's message from the comment to my post as an answer: No, the PDFDataProvider does not write text PDFs. full-page OCR can be used through a scan queue to accomplish text searching.
07-07-2015 04:29 AM
07-07-2015 05:13 AM
Natively, the PDFDataProvider produces image-only PDF files when converting supported file types. There is no API setting to modify this.
As James mentioned, if you're licensed for full-page OCR (batch or ad-hoc), you can create a text-searchable PDF rendition that way but not directly through the API. You can, however, create a scan batch through the API and push the document into a scan queue configured for OCR in order to automate part of the process. Likewise, you could use workflow (with the "Queue Document for OCR" action) to place the document in the "Awaiting Ad Hoc OCR" queue)
07-07-2015 06:43 AM
07-07-2015 06:44 AM
I'm marking Scott's message from the comment to my post as an answer: No, the PDFDataProvider does not write text PDFs. full-page OCR can be used through a scan queue to accomplish text searching.
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.