<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Auto population from OCR document in to Content Model in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262291#M215421</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi all,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;We can use good OCR enabled scanner of system to make our image based document translate to characters. However problem is most of the time we have hundreds of distributed users. In that case it is not feasible because of each and every users can not have OCR systems. So in a such a case it is very useful centralized OCR Technic available with Alfresco.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;SAMU&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 09 Jul 2012 05:08:36 GMT</pubDate>
    <dc:creator>samudaya</dc:creator>
    <dc:date>2012-07-09T05:08:36Z</dc:date>
    <item>
      <title>Auto population from OCR document in to Content Model</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262288#M215418</link>
      <description>Hi,We have built a small application using Alfresco to abstract legal documents.&amp;nbsp; To capture data after abstraction( on a page that contains text boxes, text area etc) we have written our own content model and services for the business logic. The model and the services are incorporated in to Alfresc</description>
      <pubDate>Tue, 24 Apr 2012 09:56:17 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262288#M215418</guid>
      <dc:creator>chaitanya</dc:creator>
      <dc:date>2012-04-24T09:56:17Z</dc:date>
    </item>
    <item>
      <title>Re: Auto population from OCR document in to Content Model</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262289#M215419</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;We have done the same kind of things but when we have design the clients tools , we have decide to do that externaly, during the pre-processing because for your need , you will add to add a Tesseract-like software to work with alfresco ( community users share different piece of code about that ) and then use the result to populate your cm .&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;So we have decide to do that in pre-proc , so we extract data and implement them as metadata in the pdf file , and then write a custom meta extractor to populate auto assigned Aspect defined by a folder rules.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;hope it help .&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Cédric&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 24 Apr 2012 12:23:55 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262289#M215419</guid>
      <dc:creator>cnerger</dc:creator>
      <dc:date>2012-04-24T12:23:55Z</dc:date>
    </item>
    <item>
      <title>Re: Auto population from OCR document in to Content Model</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262290#M215420</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Re, &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;i didn't read your last question , so it will depend :&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp; - the accuracy/training of your OCR system&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp; - the way you parse your research and the result ( you will need a big amount of file to test it (min 1000) to get significant result.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp; - &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Honestly , i've working during a year about opensource software for storage archives (in afresco ), and you should have a really good scanner or capture device!!&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Get from data from an unstructured&amp;nbsp; document is quiet hard , because of amount of trash.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I think you might see for an external process who is "trivial" in front of this kind of implementation into Alfresco.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;my 2 cents&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Cédric&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 24 Apr 2012 13:44:45 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262290#M215420</guid>
      <dc:creator>cnerger</dc:creator>
      <dc:date>2012-04-24T13:44:45Z</dc:date>
    </item>
    <item>
      <title>Re: Auto population from OCR document in to Content Model</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262291#M215421</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi all,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;We can use good OCR enabled scanner of system to make our image based document translate to characters. However problem is most of the time we have hundreds of distributed users. In that case it is not feasible because of each and every users can not have OCR systems. So in a such a case it is very useful centralized OCR Technic available with Alfresco.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;SAMU&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 09 Jul 2012 05:08:36 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262291#M215421</guid>
      <dc:creator>samudaya</dc:creator>
      <dc:date>2012-07-09T05:08:36Z</dc:date>
    </item>
    <item>
      <title>Re: Auto population from OCR document in to Content Model</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262292#M215422</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;We have implemented an OCR server integrated with Alfresco, which can be used as transformer or via Javascript and Java. It runs on&amp;nbsp; a separate OCR server and supports Abbyy and Google OCR. for more informaiton see here - &lt;/SPAN&gt;&lt;A href="https://forums.alfresco.com/en/viewtopic.php?f=33&amp;amp;t=44739" rel="nofollow noopener noreferrer"&gt;https://forums.alfresco.com/en/viewtopic.php?f=33&amp;amp;t=44739&lt;/A&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 01 Aug 2012 14:26:30 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/auto-population-from-ocr-document-in-to-content-model/m-p/262292#M215422</guid>
      <dc:creator>wmay</dc:creator>
      <dc:date>2012-08-01T14:26:30Z</dc:date>
    </item>
  </channel>
</rss>

