cancel
Showing results for 
Search instead for 
Did you mean: 

Is it possible to sweep in PDFs through a scan queue and keep them as PDFs?

Dwight_Leidholm
Champ in-the-making
Champ in-the-making

I am working with two scan queues. My first question is that on one scan queue, it brings in PDFs and converts them to TIFF images without any image processing using advanced capture to capture the document types and a bar code process to capture keywords. How is it converting these to TIFF? My second question is on another scan queue that i am working with. This other scan queue sweeps in TIFFs with no issue, but sending a PDF through  assigns it to unidexed document and the process doesn't do anything to it. This scan queue is configured to use bar code processing to capture the document types while advanced capture retrieves the keywords. The opposite of the first scan queue. Is it the set up of this that is causing them not to index? The bar code processor doesn't seem to want to work or assign a document type when they are PDFs. Thank you.

1 ACCEPTED ANSWER

Steve_Reed
Employee
Employee

Hi Dwight,

Advanced Capture will function on either image file formats (such as TIFF) or PDF documents, for the purposes of reading OCR zones or reading bar codes.  However, if it is configured to break documents up into smaller documents (which depends on the configuration of the form templates) then the resulting split documents will always be converted into images if they were originally PDFs.  If there is no splitting configured, however, documents will remain in their original format.   Assuming that the splitting you are performing is intentional, you do have two options for converting the documents back to PDF before they leave the scan queue process if you need - one would be to use the PDF Conversion queue option for your scan queue, which converts images documents into PDF documents (with the caveat that these are not text-searchable PDFs - but a full page OCR license is not required).  The other option would be to use the full page OCR queue (if you are so licensed) to create text-searchable PDFs from your documents after they have been processed by Advanced Capture.  

Regarding the second part of your question - the bar code recognition server is only capable of reading bar codes from image format documents, which is why it is not performing bar code indexing on your swept PDF documents.    In this case, you would have to use the Image Processing queue in your scan queue to have the PDF documents converted to image format prior to the bar code server queue.   There are several different methodologies available here to perform this, using either traditional image processing functions in the service mode OnBase client, or in the Data Capture Server.   After the bar code processing has indexed the documents, you could then use one of the two options mentioned in the previous paragraph to convert the documents back to PDF, if needed.   Or - depending on the nature of the PDFs you are importing and where the required metadata is stored within them, you may be able to use the PDF Input Filter module to import these documents and have their metadata extracted from them in their native PDF format without any format conversions required.  Refer to the Module Reference Guide on the PDF Input Filter for more information on this topic.  Information on the Image Processing and PDF Conversion queues are available in the Document Imaging Module Reference Guide.    Information on the Full Page OCR queue is available in the OCR Module Reference Guide and Data Capture Server deployment MRG.

 

View answer in original post

1 REPLY 1

Steve_Reed
Employee
Employee

Hi Dwight,

Advanced Capture will function on either image file formats (such as TIFF) or PDF documents, for the purposes of reading OCR zones or reading bar codes.  However, if it is configured to break documents up into smaller documents (which depends on the configuration of the form templates) then the resulting split documents will always be converted into images if they were originally PDFs.  If there is no splitting configured, however, documents will remain in their original format.   Assuming that the splitting you are performing is intentional, you do have two options for converting the documents back to PDF before they leave the scan queue process if you need - one would be to use the PDF Conversion queue option for your scan queue, which converts images documents into PDF documents (with the caveat that these are not text-searchable PDFs - but a full page OCR license is not required).  The other option would be to use the full page OCR queue (if you are so licensed) to create text-searchable PDFs from your documents after they have been processed by Advanced Capture.  

Regarding the second part of your question - the bar code recognition server is only capable of reading bar codes from image format documents, which is why it is not performing bar code indexing on your swept PDF documents.    In this case, you would have to use the Image Processing queue in your scan queue to have the PDF documents converted to image format prior to the bar code server queue.   There are several different methodologies available here to perform this, using either traditional image processing functions in the service mode OnBase client, or in the Data Capture Server.   After the bar code processing has indexed the documents, you could then use one of the two options mentioned in the previous paragraph to convert the documents back to PDF, if needed.   Or - depending on the nature of the PDFs you are importing and where the required metadata is stored within them, you may be able to use the PDF Input Filter module to import these documents and have their metadata extracted from them in their native PDF format without any format conversions required.  Refer to the Module Reference Guide on the PDF Input Filter for more information on this topic.  Information on the Image Processing and PDF Conversion queues are available in the Document Imaging Module Reference Guide.    Information on the Full Page OCR queue is available in the OCR Module Reference Guide and Data Capture Server deployment MRG.