cancel
Showing results for 
Search instead for 
Did you mean: 

Text Match Form Definition not working with PDF documents in Advanced Capture

Larry_Henry
Confirmed Champ
Confirmed Champ

When creating a Form Identity zone on a PDF document for Advanced Capture, the zone will not perform a Text Match. It wants to be an Image Match but the zone does not contain an image just text. The document is unable to be indexed into OnBase.

For testing purposes I converted a PDF to TIFF and when I create a Text Match Form Identity zone, it works as expected. I am hoping to avoid doing a PDF to TIFF conversion so any solutions/advice is greatly appreciated.

We are on EP3

3 REPLIES 3

Hi Larry,

 

If you force the zone configuration to be a text match in Advanced Capture configuration (I understand that it is defaulting to an image match type, but you should be able to manually change the tab selection back to text match, or if necessary delete and recreate the zone, and in the initial setup you can change the tab back to text).  If you do this, then does the OCR work properly on the zone at runtime?  

Larry_Henry
Confirmed Champ
Confirmed Champ

Hi Steve,

 

Since your post, I have done the following steps

 

1. Delete current zone.
2. Save Configuration.
3. Create new form identification zone
          Defaults to 'Image Match' tab
5. Change to 'Text Match' tab
6. The 'Match value:' box is empty.
          The zone is not seeing any text.
7. I can type in the text.
          Use Contains text
          Check Uppercase and Lowercase
          Not checked is 'Use registration point'. It does not find it.
8. Process current document and the form is not identified.

 

The particular document types that I am having issues with are coming in from DocuSign as PDF so I don't know if that makes a difference or not.

Perhaps DocuSign is creating the PDF as an image, and encrypting the internal PDF data (because it is digitally signed), making it so the OCR engine is unable to read the data itself?  If you do a test process run on the document in AC config, does the diagnostics window show that any text at all is being read from the zone, or it shows no text being decoded at all?