<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: OCR Integration with Alfresco in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301584#M254714</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;One idea would be to write a custom transformer that transforms from your source mimetype to text by leveraging your existing OCR web service. The reason is that the full-text indexing mechanism will leverage the transformer when it tries to find text it can ingest into the index. So if you have written a transformer that does that, your content will be indexed.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Jeff&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 13 Feb 2014 23:50:33 GMT</pubDate>
    <dc:creator>jpotts</dc:creator>
    <dc:date>2014-02-13T23:50:33Z</dc:date>
    <item>
      <title>OCR Integration with Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301583#M254713</link>
      <description>Currently alfresco can provide search capability for textual content. We would also like to search on text inside images and PDF's with text inside the images. We have an existing Webservice which will provide OCR functionality. (i.e. Read a document and return the text data back) I would like to kn</description>
      <pubDate>Thu, 13 Feb 2014 22:02:24 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301583#M254713</guid>
      <dc:creator>ashpal19</dc:creator>
      <dc:date>2014-02-13T22:02:24Z</dc:date>
    </item>
    <item>
      <title>Re: OCR Integration with Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301584#M254714</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;One idea would be to write a custom transformer that transforms from your source mimetype to text by leveraging your existing OCR web service. The reason is that the full-text indexing mechanism will leverage the transformer when it tries to find text it can ingest into the index. So if you have written a transformer that does that, your content will be indexed.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Jeff&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 13 Feb 2014 23:50:33 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301584#M254714</guid>
      <dc:creator>jpotts</dc:creator>
      <dc:date>2014-02-13T23:50:33Z</dc:date>
    </item>
    <item>
      <title>Re: OCR Integration with Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301585#M254715</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi Jeff, &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Thank you for your suggestion. I agree to this approach of custom content transformer, because by doing this the content that we have transformed using our 3rd party service will be indexed automatically by alfresco. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I have written a custom tranformer class - &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;package org.alfresco.repo.content.transform;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;import java.io.InputStreamReader;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;import java.io.OutputStreamWriter;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;import java.io.Reader;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;import java.io.Writer;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;import org.alfresco.service.cmr.repository.ContentReader;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;import org.alfresco.service.cmr.repository.ContentWriter;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;import org.alfresco.service.cmr.repository.TransformationOptions;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;public class OCRContentTransformer extends AbstractContentTransformer2 {&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;@Override&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;protected void transformInternal(ContentReader reader, ContentWriter writer,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;TransformationOptions options) throws Exception {&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;System.out.println("inside the transform internal method and now the index would be updated with the latest content");&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; //transformText(reader, writer, options);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;@Override&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;public boolean isTransformableMimetype(String sourceMimetype,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;String targetMimetype, TransformationOptions options) {&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;// TODO Auto-generated method stub&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return super.isTransformableMimetype(sourceMimetype, targetMimetype, options);&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;and I am also referring to this class in a custom context file my-transformers-context.xml (attached) (placed under C:\Alfresco\tomcat\shared\classes\alfresco\extension)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am getting the attached error. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I also referred to the wiki - &lt;/SPAN&gt;&lt;A href="https://wiki.alfresco.com/wiki/Content_Transformations#Developing_New_Transformations" rel="nofollow noopener noreferrer"&gt;https://wiki.alfresco.com/wiki/Content_Transformations#Developing_New_Transformations&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Note: I have tried this class with and without overriding the isTransformableMimetype()&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Your help is highly appreciated. &lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 18 Feb 2014 18:39:00 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301585#M254715</guid>
      <dc:creator>ashpal19</dc:creator>
      <dc:date>2014-02-18T18:39:00Z</dc:date>
    </item>
    <item>
      <title>Re: OCR Integration with Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301586#M254716</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I have got this to work, please ignore my previous response. &lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 18 Feb 2014 21:58:09 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301586#M254716</guid>
      <dc:creator>ashpal19</dc:creator>
      <dc:date>2014-02-18T21:58:09Z</dc:date>
    </item>
    <item>
      <title>Re: OCR Integration with Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301587#M254717</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Please look at the attached context file and Java class for the custom content transformer. This transformer is getting invoked almost everytime, even when I click on a particular file to view it. I understood that because I have added sysouts in the isTransformableMimetype() method. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Further the transformInternal() method is not getting called. I am not sure why it is not getting called? I am confused what am I doing wrong? Can you please guide me further, if there is any additional configuration I am missing, or I need to add further details in the custom java class? &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thank you for your help, I really appreciate it. &lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Feb 2014 17:26:38 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301587#M254717</guid>
      <dc:creator>ashpal19</dc:creator>
      <dc:date>2014-02-19T17:26:38Z</dc:date>
    </item>
    <item>
      <title>Re: OCR Integration with Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301588#M254718</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;What are the specific steps you are following to test this?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Jeff&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Feb 2014 22:41:13 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301588#M254718</guid>
      <dc:creator>jpotts</dc:creator>
      <dc:date>2014-02-19T22:41:13Z</dc:date>
    </item>
    <item>
      <title>Re: OCR Integration with Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301589#M254719</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I was testing this process through the alfresco share UI, by uploading pdf's or image files. I figured out after struggling for a long time that, "If two transformers perform the same transformation, the most reliable one will always be chosen."(this is also mentioned in the wiki which I noticed later) Hence my custom tranformer's transform internal method was never being used up since there were existing more reliable transformers for the same transformation. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Then I defined a new tranformation which does not exist in the current alfresco configuration and tested with it (image/jpeg –&amp;gt; text/plain). Now it invokes the transformInternal() method properly. However I still have another question - why does this transformation get called even when I click on that particular document type to view it? Will the transformation run even when I view the document? &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thank you for your help Jeff and thank you for responding to my query.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Feb 2014 23:03:23 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/ocr-integration-with-alfresco/m-p/301589#M254719</guid>
      <dc:creator>ashpal19</dc:creator>
      <dc:date>2014-02-19T23:03:23Z</dc:date>
    </item>
  </channel>
</rss>

