I have installed Alfresco 3.1 and have it running smoothly on Debian Lenny using Apache/Tomcat.
I'm now looking at OCR and have installed Ocropus and Tesseract. Both of these are running perfectly. I have tried to implement an ocr transformation xml file without any luck.
Has anyone completed a successful integration of Ocropus/Tesseract with Alfresco? Can you list your xml and any other specific modifications you needed to make?
I understand Tesseract can't convert pdf, but for now tif to text is ok. I'm hoping tesseract comes along in leaps and bounds now that it is a google funded project, as there seems to be a big discrepancy between the quality of the Windows and Linux OCR options.
Any help is appreciated, and I will share whatever knowledge I can work out on getting OCR working well with Alfresco linux.