i just got approach 1 working for OCR for AnyDoc 3.2 on Alfresco 2.0. i use it to import medical claims images. here's how it works:
1. a script observes the AnyDoc output directory, when it finds a TXT output file it reads the file, counts the records, and compares the count to the number/names of images in the image output directory.
2. if these match, i read in an XML template (based on the ACP XML schema for my custom claims content type) and replace it's values with the values from the AnyDoc output (once for each image).
3. then I write out the XML, copy over the image files, zip the whole thing up into an ACP, and move the ACP into a CIFS directoy that's got an action to import an ACP to my claims space.
Right now the whole thing is just a proof-of-concept. I'd like to move the whole process into the Alfresco environment and then figure out a robust way to handle and report errors.