cancel
Showing results for 
Search instead for 
Did you mean: 

Automatically extract information from a document

matignon
Champ in-the-making
Champ in-the-making

Hello,

I'm looking for a programm, which could automatically extract defined informations and save it in a fields (those fields would be used by an other software through an interface)

For example, I'm getting a bill from Apple. Thanks the predefined areas (in an area, there is the adress of the company, and in another area, words which can clearly define the document as a bill), the programm shall know, which kind of document it is. Then it shall extract defined informations (like the bill number, articles number, amount of articles etc.) in different fields I would have defined.

I know that some DMS programm do that, but I unfortunatly don't know the technical word for that. Is it possible do build it with Alfresco? If yes, where can I find the way how to build it?

I thank you in forward for your help and wishing you already a good weekend!

Cheers,

Matt

4 REPLIES 4

angelborroy
Community Manager Community Manager
Community Manager

Alfresco cannot extract this information out-of-the-box. You need some external software as Ephesoft or Kofax to perform this operations before uploading your documents to Alfresco.

Hyland Developer Evangelist

mitpatoliya
Star Collaborator
Star Collaborator

Technical word for thing which you are looking for is "OCR" --> Optical Character Reorganization

Alfresco does not have this ability. You can go for Epehsoft or Kofax like tools to achieve that.

There are many plugins available which will help you to integrate Alfresco and those tools.

What Alfresco do is called Meta data Extraction--> Where it can read some predefined metadata like author, name ,description etc.. from well known document types and map it with alfresco content model.

douglascrp
World-Class Innovator
World-Class Innovator

Mittal Patoliya‌ Actually, the OCR is only part of the problem.

OCR will extract the text content from the image, but what Matt ignon needs is something more advanced.

In Kofax, they call it Page/Form recognition, in Ephesoft, Intelligent Document Capture (IDR), and in Captiva, Intelligent document recognition

Matt ignon I believe those are the "terms" you should be searching for.

EDIT:

Some links:

https://www.bluefishgroup.com/insights/ecm/whats-so-intelligent-about-intelligent-document-capture/

EMC Captiva - The Power of Intelligent Document Recognition 

ftp://ftp.kofax.com/pub/support/capture/obsolete/ac/7/product_documentation/application_notes/docsep...

matignon
Champ in-the-making
Champ in-the-making

Thanks very much for your quick answer. I'll have a look on the two mentionned software!