I have been looking for a way to index a series of searchable-tiff files (made with Microsoft Document imaging -MDI) package.
MDI is doing the OCR over the files with remarkable accuracy (and I already have a license for it on my PC… so much cheaper than buying more software!)
Goal being:
1/ Scan from printer to TIFF
2/ Printer FTP's to file store
3/ edocfile's "TIFF to Searchable TIFF" - saving to alfresco CIFS share
Has anyone written (much preferred opensource) transform for searchable TIFF's?
My guess is that OpenOffice/ImageMagick (whichever is suppose to do a TIFF transform) doesn't know how to deal with a SEARCHABLE tiff - only image.
Am I pushing alfresco too hard to be able to do this? I believe that Microsoft licensed the scansoft OCR engine so maybe searchable TIFF is proprietory to only microsoft… considering how many people are likely to have MDI on their systems… it can't be an unusual config?!
It woudl make it a much cheaper solution to have a windows machine running edocfiles' TIFF to Searchable TIFF application over the other thirdparty platforms… and I can do it manually while I "proof of concept"..
Searchable PDF indexing does work… Searchable TIFF indexing does not.
BTW: have varied sucesses with Ocropus, Tessaler, abbyy fineread, etc… just interested as we have a heavy use of searchable tiff already.