cancel
Showing results for 
Search instead for 
Did you mean: 

OCR

p4w3l
Champ in-the-making
Champ in-the-making
I'm just trying to find a reading about how indexing works in Alfresco if you have a picture - scanned PDF or so and want to keep oryginal scan and in the same time find the document with search.

Please somebody elaborate about Alfresco architecture and two ( in my mind ) possible approach:

1. If transform is not neccessary - just need to have documment OCRed for a little time it needs to be indexed. Later if user searchs - it finds an oryginal document ( a document scan ). There is no need to have that document as text. Just there is a need to find it. Is this approach possible?

2. Say we have a document scan and want to have it OCRed version as text or searchable PDF. Will a transfromer replace the oryginal document ? How to arrange to have both oryginal ( PDF as scan ) and OCRed ( serchable PDF ) so they looks like a one item ?

I try to learn Alfresco and map its abstracts to real user needs.
1 REPLY 1

openpj
Elite Collaborator
Elite Collaborator
Alfresco extracts the text from files to create search indexes.

If you need a different scenario about OCR and Searchable formats I suggest you to install and configure Ephesoft.
Ephesoft can be configured for this kind of scenarios and it supports any CMIS repository for dropping scanned document. In this way you can set up your scenario as you wish. You can also create an association in Alfresco between the original scanned file and the searchable version of the file.

Hope this helps.