03-02-2017 11:12 PM
Hi,
Need a help.How to extract the data(i.e name and date of birth) in scanned PDF files and convert into the single based on name as well as date of birth using alfresco bulk import.I am using the Alfresco 5.0 version.
03-03-2017 11:21 AM
You cannot do this with Alfresco out-of-the-box without additional third-party tools. You need to OCR the scanned image so that it will be converted into text and you need to use one or more "zone" features to read the text from a standard area in the document into the name and date-of-birth properties.
If your PDF has already been converted into text and your OCR software does not have the ability to read from a zone, then I suppose you could write your own code that would parse the text to extract the name and date-of-birth.
There are several people in the community who have integrated various OCR solutions with Alfresco, so you should be able to find something that will work for you.
03-03-2017 11:21 AM
You cannot do this with Alfresco out-of-the-box without additional third-party tools. You need to OCR the scanned image so that it will be converted into text and you need to use one or more "zone" features to read the text from a standard area in the document into the name and date-of-birth properties.
If your PDF has already been converted into text and your OCR software does not have the ability to read from a zone, then I suppose you could write your own code that would parse the text to extract the name and date-of-birth.
There are several people in the community who have integrated various OCR solutions with Alfresco, so you should be able to find something that will work for you.
Explore our Alfresco products with the links below. Use labels to filter content by product module.