cancel
Showing results for 
Search instead for 
Did you mean: 

transform pdf, return new version

abruzzi
Champ on-the-rise
Champ on-the-rise
We have an OCR server (ABBYY Recognition Server) that can perform its work over a SOAP connection.  I've built a transformation for PDF to TXT.  Basically, the script that is called by the transformation first looks for a text layer and exports and returns that using pdftotext if available skipping the OCR server.  However if there is no text layer, it uses SOAP and runs it through the OCR server, getting back text which is returned to alfresco for indexing.

That all works great.  However, the OCR server has the ability to return a PDF with the OCR text as a text layer.  What I was hoping is possible is to have a PDF to PDF transformation that would be set as a rule or run manually (with a webscript perhaps?) that sends the PDF out and gets the PDF back, then inserts the new PDF as a new minor version of the original document.

I've done a lot with the 3.1.2 Alfresco Explorer, but Share is very new to me, and our new setup is going into Share, so I'm not sure where to begin.  I can see that the Javascript API has some versioning capability:

http://docs.alfresco.com/4.2/topic/com.alfresco.enterprise.doc/references/API-JS-Versions.html

But I'm not sure where to start with this.  Any suggestions?
3 REPLIES 3

romschn
Star Collaborator
Star Collaborator
If i am understanding your requirement correctly then you want to setup a business rule in alfresco wherein when a PDF is uploaded to a space, it will be sent to OCR server for processing. And OCR server after processing the PDF will add the OCR text as a text layer on that PDF and will send the updated PDF back.
Once the processed PDF is returned from OCR server, it should be added as a minor version to the original PDF.

Is this your requirement?

abruzzi
Champ on-the-rise
Champ on-the-rise
Exactly.  The OCR side I know how to do, and the SOAP is fairly easy (if I have to do SOAP in JavaScript, that's a bit trickier). I'm just not sure where to start with Alfresco.

romschn
Star Collaborator
Star Collaborator
Okay. You will have to create custom java based action which will do the required processing for you. In this action, make a remote call to OCR server and get the processed PDF. Once the PDF is returned using VersionService add the new PDF as a new minor version on the original PDF. Also register this new action so as it is available while defining a business rule.