<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to automate OCR prior to uploading to Alfresco in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62342#M21604</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In my case, something similar to this behaviour, docs were scanned, ocr'ed and saved in a particular folder on filesystem structure (shared folder). Over that folder I had an application checking to extract metadata, change name of files and afterwards upload -using CMIS- the file to Alfresco.&lt;/P&gt;&lt;P&gt;In Alfresco this documents were classified using content rules and scripts depending on filename and metadata.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, maybe you can try to develop an external application to do all funcionality you need, before upload the file to Alfresco using CMIS.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;clv&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 01 Jun 2018 07:52:04 GMT</pubDate>
    <dc:creator>calvo</dc:creator>
    <dc:date>2018-06-01T07:52:04Z</dc:date>
    <item>
      <title>How to automate OCR prior to uploading to Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62341#M21603</link>
      <description>We are new to Alfresco Community.&amp;nbsp;When we scan paper documents they are automatically OCR'ed and we are given the option to change the filename to our chosen naming convention.&amp;nbsp; We then drag and drop the file into Alfresco.&amp;nbsp; This works very well!When our users have digital files they want to drag an</description>
      <pubDate>Thu, 31 May 2018 23:18:52 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62341#M21603</guid>
      <dc:creator>kellerclark</dc:creator>
      <dc:date>2018-05-31T23:18:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to automate OCR prior to uploading to Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62342#M21604</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In my case, something similar to this behaviour, docs were scanned, ocr'ed and saved in a particular folder on filesystem structure (shared folder). Over that folder I had an application checking to extract metadata, change name of files and afterwards upload -using CMIS- the file to Alfresco.&lt;/P&gt;&lt;P&gt;In Alfresco this documents were classified using content rules and scripts depending on filename and metadata.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, maybe you can try to develop an external application to do all funcionality you need, before upload the file to Alfresco using CMIS.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;clv&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 01 Jun 2018 07:52:04 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62342#M21604</guid>
      <dc:creator>calvo</dc:creator>
      <dc:date>2018-06-01T07:52:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to automate OCR prior to uploading to Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62343#M21605</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;...there are so many possibilities.. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;do you use only one scanner or a bunch of?&lt;/P&gt;&lt;P&gt;do you upload to a specific "inBox" or everywhere in Alfresco?&lt;/P&gt;&lt;P&gt;What OCR Software/Scanner are you using? Maybe it has a kind of "scripting" possibility or an api to add some code?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The simplest approach for me is:&lt;/P&gt;&lt;P&gt;- use a scanner with OCR-facility&lt;BR /&gt;- upload the scanned documents to a "inBox" folder (using a "post-scan" script&lt;/P&gt;&lt;P&gt;- in Alfresco: check naming convention and "is there text to extract" via "created" rule in "inBox"&lt;/P&gt;&lt;P&gt;- in Alfresco: move document to a folder, depending on naming convention (or raise an exception/move to an error folder in rule, if naming isn't valid or no text could be extracted&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 01 Jun 2018 15:08:03 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62343#M21605</guid>
      <dc:creator>mehe</dc:creator>
      <dc:date>2018-06-01T15:08:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to automate OCR prior to uploading to Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62344#M21606</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you!&amp;nbsp; That is very helpful.&amp;nbsp; I will look into CMIS, scripts and creating rules.&amp;nbsp; That sounds like it may just be the ticket.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Paper that we scan goes smoothly into Alfresco.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It is the files, on users computers, that they drop into the system that are causing problems.&amp;nbsp; They assume that since it is a PDF it has been OCR'ed.&amp;nbsp; This may or may not be the case.&amp;nbsp; If we get a bunch of unsearchable documents into our system, users will not be able to find them later and the value of the EDMS breaks down.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Short of threatening them, how can we set it up so that only OCR'ed documents go into the system?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 01 Jun 2018 15:57:32 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62344#M21606</guid>
      <dc:creator>kellerclark</dc:creator>
      <dc:date>2018-06-01T15:57:32Z</dc:date>
    </item>
    <item>
      <title>Re: How to automate OCR prior to uploading to Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62345#M21607</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If your OCR step can also set a property, that's probably easiest. Then you can have a rule check for the presence of that property.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Alternatively, the rule could do a transform to text. If the result is empty you know it wasn't OCR'd so you move the document to an exception folder or send an email or something.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 01 Jun 2018 17:45:07 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/how-to-automate-ocr-prior-to-uploading-to-alfresco/m-p/62345#M21607</guid>
      <dc:creator>jpotts</dc:creator>
      <dc:date>2018-06-01T17:45:07Z</dc:date>
    </item>
  </channel>
</rss>

