<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Retrieve content from a document in javascript in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/retrieve-content-from-a-document-in-javascript/m-p/112515#M31369</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;So, if you are considering to write scripts that run inside the Alfresco Repository application, you may want to look into the documentation of &lt;A href="https://docs.alfresco.com/6.1/concepts/API-JS-intro.html" rel="nofollow noopener noreferrer"&gt;that JavaScript API&lt;/A&gt;, especially the part about &lt;A href="https://docs.alfresco.com/6.1/references/API-JS-Content.html" rel="nofollow noopener noreferrer"&gt;accessing content-related attributes&lt;/A&gt;. But with JavaScript you will generally be limited to working with&amp;nbsp;textual content files,&amp;nbsp;e.g. not PDF files (which are more or less in binary form) that have a text layer added above.&lt;/P&gt;&lt;P&gt;BUT, if the text layer is added by OCR, Alfresco will be able to index the document using SOLR, and you can definitely use JavaScript to execute a search query for the content, and then find the document to process further via JavaScript - you just may not be able to search in the content of the PDF itself, only indirectly via its indexed text in SOLR.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 30 Aug 2019 07:44:31 GMT</pubDate>
    <dc:creator>afaust</dc:creator>
    <dc:date>2019-08-30T07:44:31Z</dc:date>
    <item>
      <title>Retrieve content from a document in javascript</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/retrieve-content-from-a-document-in-javascript/m-p/112514#M31368</link>
      <description>Hello,I want to retrieve&amp;nbsp;some informations from a text of a pdf file (scanned files).I started by using pdfsandwich OCR to extract the text in the images (the text&amp;nbsp;is added to each page invisibly "behind" the images), what i want to do, is search that text for informations that i need, How can i do</description>
      <pubDate>Thu, 29 Aug 2019 13:29:17 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/retrieve-content-from-a-document-in-javascript/m-p/112514#M31368</guid>
      <dc:creator>imanez1</dc:creator>
      <dc:date>2019-08-29T13:29:17Z</dc:date>
    </item>
    <item>
      <title>Re: Retrieve content from a document in javascript</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/retrieve-content-from-a-document-in-javascript/m-p/112515#M31369</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;So, if you are considering to write scripts that run inside the Alfresco Repository application, you may want to look into the documentation of &lt;A href="https://docs.alfresco.com/6.1/concepts/API-JS-intro.html" rel="nofollow noopener noreferrer"&gt;that JavaScript API&lt;/A&gt;, especially the part about &lt;A href="https://docs.alfresco.com/6.1/references/API-JS-Content.html" rel="nofollow noopener noreferrer"&gt;accessing content-related attributes&lt;/A&gt;. But with JavaScript you will generally be limited to working with&amp;nbsp;textual content files,&amp;nbsp;e.g. not PDF files (which are more or less in binary form) that have a text layer added above.&lt;/P&gt;&lt;P&gt;BUT, if the text layer is added by OCR, Alfresco will be able to index the document using SOLR, and you can definitely use JavaScript to execute a search query for the content, and then find the document to process further via JavaScript - you just may not be able to search in the content of the PDF itself, only indirectly via its indexed text in SOLR.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 30 Aug 2019 07:44:31 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/retrieve-content-from-a-document-in-javascript/m-p/112515#M31369</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2019-08-30T07:44:31Z</dc:date>
    </item>
  </channel>
</rss>

