cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco Bulk import for reading the scanned files

vicky123
Champ in-the-making
Champ in-the-making

Hi,

Need a help.How to extract the data(i.e name and date of birth) in scanned PDF files and convert into the single based on name as well as date of birth using alfresco bulk import.I am using the Alfresco 5.0 version.  

1 ACCEPTED ANSWER

jpotts
World-Class Innovator
World-Class Innovator

You cannot do this with Alfresco out-of-the-box without additional third-party tools. You need to OCR the scanned image so that it will be converted into text and you need to use one or more "zone" features to read the text from a standard area in the document into the name and date-of-birth properties.

If your PDF has already been converted into text and your OCR software does not have the ability to read from a zone, then I suppose you could write your own code that would parse the text to extract the name and date-of-birth.

There are several people in the community who have integrated various OCR solutions with Alfresco, so you should be able to find something that will work for you.

View answer in original post

1 REPLY 1

jpotts
World-Class Innovator
World-Class Innovator

You cannot do this with Alfresco out-of-the-box without additional third-party tools. You need to OCR the scanned image so that it will be converted into text and you need to use one or more "zone" features to read the text from a standard area in the document into the name and date-of-birth properties.

If your PDF has already been converted into text and your OCR software does not have the ability to read from a zone, then I suppose you could write your own code that would parse the text to extract the name and date-of-birth.

There are several people in the community who have integrated various OCR solutions with Alfresco, so you should be able to find something that will work for you.