cancel
Showing results for 
Search instead for 
Did you mean: 

Can Alfresco be used to manage scanned paper documents?

evolve2k
Champ in-the-making
Champ in-the-making
We are a small accounting firm looking at a document management solution to move towards the 'less paper' office.

Seems Alfesco looks great as a broader CMS, could you outline why Alfresco would be suitable for managing scanned in documents, along the normal lines as required for any office's general administration.

Ie. All the mail arrives in the morning, the receptionist feeds it into the fancy feeder scanner, then indexes/files it online into the document management software. Mail includes, client correspondence, tax office letters, seminar invites, bills to pay etc.

Users can view new documents 'marked for their attention' as well as everything is essentially pre-filed and available using a search as well as under standard categories/folder like retrieval structure.

Could anyone outline if & how we could achive something along these lines using Alfresco. Also any references to tutorials/resources on how this is done, ie From scanner through to Alfresco, would be much appreciated.

Thanks in advance.

Evolve2k
9 REPLIES 9

seanh
Champ in-the-making
Champ in-the-making
I, too, need this functionality. Is this currently possible?

Thanks,
Sean

davidc
Star Contributor
Star Contributor
Hi,

Alfresco integrates with Kofax and eCopy; leading scanning and capture solutions.  This means that scanned documents can be added to Alfresco automatically.

Alfresco can then categorise and file those documents according to user defined "Rules".  These are like Inbox rules in MS Office.

Search can be performed against the content, content meta-data, folder location and category.  The scanning solutions can extract important values from scanned documents which may be used as content meta-data for advanced Alfresco searches e.g. find tax office letter with reference number 12456.

Workflows or notifications (such as an e-mail) may be triggered on addition of new content or an rss feed may be subscribed to.

All of this available by configuring scanning integration and rules.  No coding is required.

I suggest you send an e-mail to info@alfresco.com with your requirements where more information about how to get the scanning integration can be made available.

jharrop
Champ in-the-making
Champ in-the-making
Hi David

Are you aware of anything similar to the Kofax release script planned or available for Nuance's OmniPage Professional 15?  I see that Nuance are not on your partners list.

thanks

Jason

othni
Champ in-the-making
Champ in-the-making
Does anybody can tell us about how a Kofax Ascent Capture integration would impact economically an Alfresco implementation?

Is it licensed by scanning workstation?

************************

Is Alfresco going to consider implementing capture features in the near future?

rbelisle
Champ in-the-making
Champ in-the-making
I work for a Kofax partner. Kofax licenses their software by a combination of workstations required and pages scanned. Depending on those variables, the cost is determined for software. For installation and professional services, really depends on how much configuration is required. You could look at 5K+ for an install. The release script is normally supplied by Kofax, if it is an official Kofax supported release script. I think that the 2.0 version of Alfresco includes the release scripts as part of the download.

Just gives you some ideas on how the cost would be impacted.

bsawler
Champ in-the-making
Champ in-the-making
hi,

I recently purchased the "Alfresco book" by Munwar Shariff, and there is a Chapter (13) dedicated to Implementing Imaging and Forms Processing, which runs through an example showing how a French bank scans and processes 20,000 documents / hour… nice.

I recommend you have a read of this book.  I cannot comment anymore cause I haven't implemented any scanning yet and I do not want to infringe on the books Copyright protection.

Cheers,
Bradley

rscheele
Champ in-the-making
Champ in-the-making
But wouldn't it be nice to integrate open source OCR software into alfresco? That would be completely into the Alfresco philisophy and would save a lot of money for medium sized business without large volume production scanners. I can imagine flat bed scanners at every medium sized department.

eg:
http://code.google.com/p/tesseract-ocr/
http://code.google.com/p/ocropus/

pav5088
Champ in-the-making
Champ in-the-making
But wouldn't it be nice to integrate open source OCR software into alfresco? That would be completely into the Alfresco philisophy and would save a lot of money for medium sized business without large volume production scanners. I can imagine flat bed scanners at every medium sized department.

eg:
http://code.google.com/p/tesseract-ocr/
http://code.google.com/p/ocropus/

  I don't think there's yet a Tesseract based package to create a searchable PDF, or at least one that's free.  (I have my suspicions that ScanWiz may be Tesseract based).

  I guess Alfresco could index PDFs without making them searchable using Tesseract in the way that DocMGR does : http://docmgr.sourceforge.net/install.php .  Still, the real solution is to make the PDFs searchable in the first place, and then Alfresco would index them quite happily.

  As an aside it would be nice if Alfresco could pass search-words to Acrobat Reader so that PDFs open with search-words already highlighted.  This can be done through Acrobat Readers "Open Parameters" via a URL.

nicolasraoul
Star Contributor
Star Contributor
But wouldn't it be nice to integrate open source OCR software into alfresco? That would be completely into the Alfresco philosophy and would save a lot of money for medium sized business without large volume production scanners. I can imagine flat bed scanners at every medium sized department.

Sorry to resurrect an old thread, but I am trying to achieve this, and it does not look very difficult.
I wrote a few lines of code to add invisible text to an existing PDF (using the Open Source Java PDFBox library).

So now I guess I have all of the pieces, and it becomes an Alfresco question: How to best architecture this?
Maybe an Alfresco action that calls Tesseract via command line and then inserts the OCR'd text into the PDF?
Or the same as a transformer?

Thanks for any feedback!
Nicolas Raoul