Retrieve content from a document in javascript

imanez1 — Thu, 29 Aug 2019 13:29:17 GMT

Hello,I want to retrieve some informations from a text of a pdf file (scanned files).I started by using pdfsandwich OCR to extract the text in the images (the text is added to each page invisibly "behind" the images), what i want to do, is search that text for informations that i need, How can i do

Re: Retrieve content from a document in javascript

afaust — Fri, 30 Aug 2019 07:44:31 GMT

So, if you are considering to write scripts that run inside the Alfresco Repository application, you may want to look into the documentation of that JavaScript API, especially the part about accessing content-related attributes. But with JavaScript you will generally be limited to working with textual content files, e.g. not PDF files (which are more or less in binary form) that have a text layer added above.

BUT, if the text layer is added by OCR, Alfresco will be able to index the document using SOLR, and you can definitely use JavaScript to execute a search query for the content, and then find the document to process further via JavaScript - you just may not be able to search in the content of the PDF itself, only indirectly via its indexed text in SOLR.

topic Re: Retrieve content from a document in javascript in Alfresco Forum

Retrieve content from a document in javascript

Re: Retrieve content from a document in javascript