<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Improve the result of OCR with alfresco-simple-ocr in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102440#M29283</link>
    <description>&lt;P&gt;i am having the same issue. Did u found any solution?&lt;/P&gt;</description>
    <pubDate>Fri, 23 Oct 2020 10:56:57 GMT</pubDate>
    <dc:creator>anuradha1</dc:creator>
    <dc:date>2020-10-23T10:56:57Z</dc:date>
    <item>
      <title>Improve the result of OCR with alfresco-simple-ocr</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102437#M29280</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I'm using the Alfresco addons&amp;nbsp;&lt;A href="https://github.com/keensoft/alfresco-simple-ocr" target="_self" rel="nofollow noopener noreferrer"&gt;alfresco-simple-ocr&lt;/A&gt; with pdfsandwich to extract data from an invoice, everything works fine but the results are not very accurate.&lt;/P&gt;&lt;P&gt;my invoice has this template:&lt;/P&gt;&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="PrtScr capture_6.jpg" style="width: 591px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://connect.hyland.com/t5/image/serverpage/image-id/16i9C1A3AC239C0DD65/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;and the result that i get after OCR is this:&lt;/P&gt;&lt;PRE&gt;BILLING ADDRESS INVOICE
XXXX XXXX
XXX XXXX XXXX Number 545614513
XXXX XXXX Date May 30, 2019
XXXX XXXX
Delivery No. INV1254
DELIVERY ADDRESS Your Request Date 	May 30, 2019
XXXX XXXX
XXX XXXX XXXX Your Order No. SO655614
XXXX XXXX Contract No. -
XXXX XXXX Quote No. SO655614
Customer No.
Your Contact 	XXXXX
(152)-568-5458
Our Contact XXXXX
admin@yourcompany.example.com

Pos. Prod.No. Description Qty Price/Item (USD) VAT Total (USD)
1 P_21154	XXXX 1 0.20 15% 0.20
Total USD (excl. taxes) 0.20
VAT 0.03
Total Net Price in USD (incl. VAT) 0.23&lt;/PRE&gt;&lt;P&gt;So my questions are:&lt;/P&gt;&lt;P&gt;- How can I improve the accuracy of the results? Because sometimes for example: instead of an 'S' I get a '5' or '8' instead of '8'....&amp;nbsp;&lt;/P&gt;&lt;P&gt;- How can get the results in blocks, part by part: Part1, part 2 and then part 3&lt;/P&gt;&lt;P&gt;I tried croping the invoice with this command line and it gives me the results i want, but how can i do it from inside Alfresco?:&lt;/P&gt;&lt;PRE&gt;convert -density 200 INVOICE.pdf -crop 100x50% +repage \( -clone 0 -crop 50x100% +repage -reverse \) -delete 0 -reverse INVOICE-out.pdf&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;alfresco-global.properties:&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;# OCR #

ocr.command=/usr/bin/pdfsandwich
ocr.output.verbose=true
ocr.output.file.prefix.command=-o

ocr.extra.commands=-verbose -rgb -lang fra+eng -nopreproc 
ocr.server.os=linux&lt;/PRE&gt;</description>
      <pubDate>Fri, 17 Jan 2020 11:10:02 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102437#M29280</guid>
      <dc:creator>imanez1</dc:creator>
      <dc:date>2020-01-17T11:10:02Z</dc:date>
    </item>
    <item>
      <title>Re: Improve the result of OCR with alfresco-simple-ocr</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102438#M29281</link>
      <description>&lt;P&gt;Try OCRmyPDF, it will give you better results.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://ocrmypdf.readthedocs.io/en/latest/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://ocrmypdf.readthedocs.io/en/latest/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2020 11:26:57 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102438#M29281</guid>
      <dc:creator>angelborroy</dc:creator>
      <dc:date>2020-01-17T11:26:57Z</dc:date>
    </item>
    <item>
      <title>Re: Improve the result of OCR with alfresco-simple-ocr</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102439#M29282</link>
      <description>&lt;P&gt;OCRmyPDF gives me more or less the same results.&lt;/P&gt;&lt;P&gt;How can i OCR block by block (part1, part2, part3)?&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2020 15:38:35 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102439#M29282</guid>
      <dc:creator>imanez1</dc:creator>
      <dc:date>2020-01-17T15:38:35Z</dc:date>
    </item>
    <item>
      <title>Re: Improve the result of OCR with alfresco-simple-ocr</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102440#M29283</link>
      <description>&lt;P&gt;i am having the same issue. Did u found any solution?&lt;/P&gt;</description>
      <pubDate>Fri, 23 Oct 2020 10:56:57 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102440#M29283</guid>
      <dc:creator>anuradha1</dc:creator>
      <dc:date>2020-10-23T10:56:57Z</dc:date>
    </item>
    <item>
      <title>Re: Improve the result of OCR with alfresco-simple-ocr</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102441#M29284</link>
      <description>&lt;P&gt;I'm still facing the same issue.&lt;/P&gt;&lt;P&gt;Anyone has an idea on how to extract the text from the invoice by parts?&lt;/P&gt;</description>
      <pubDate>Tue, 06 Apr 2021 17:11:49 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/improve-the-result-of-ocr-with-alfresco-simple-ocr/m-p/102441#M29284</guid>
      <dc:creator>imanez1</dc:creator>
      <dc:date>2021-04-06T17:11:49Z</dc:date>
    </item>
  </channel>
</rss>

