cancel
Showing results for 
Search instead for 
Did you mean: 

Getting Search to include content of PDFs

kyle_moyer
Champ in-the-making
Champ in-the-making
I am a new user to Alfresco.  I am making a library database for my company and have found Alfresco to be useful and easy to use.  The problem I am encountering is: I have content uploaded to the site in the form of PDF.  When I do a search the search does not include the actual content of the PDFs but only includes results of the titles of the PDFs.  I read somewhere to enable an advanced search but can not figure this out.

PLEASE HELP!!!
6 REPLIES 6

jpotts
World-Class Innovator
World-Class Innovator
By default Alfresco will index the content of your PDF files. The requirement is that the PDF be text, not an image. There is nothing more you have to do.

Can you upload other types of files (like Word docs and text files) and search or the contents of those files?

Jeff

kyle_moyer
Champ in-the-making
Champ in-the-making
I have uploaded multiple word documents and it appears that the search does include the content of the word documents but not the .PDF– Is there a way to convert the PDF to "Text" PDF, or is there a way to tell which kind of PDF I have.  I do have OCR software and thought I scanned the documents in as searchable documents.

jpotts
World-Class Innovator
World-Class Innovator

Hi Jeff,

I also have to capture a bunch of paper documents using KOFAX Capture, and I'd like to make the OCRised (fulltext) content available for searching, along with "real" document, a PDF containing an image, available for download in alfresco and previewable in share.

Any simple solution to "manually" create the full-text indexes in solr, while still having the "normal" PDF in the repository ?

Or should I use a custom property and include it in the default search pattern ?

Thanks

douglascrp
World-Class Innovator
World-Class Innovator

Hello.

If your PDF files are already OCRised, there is nothing more you have to do.

Simply upload them into Alfresco and the content will be searchable.

That is how it works by default.

bwideman30
Champ in-the-making
Champ in-the-making
Kyle,

If one of those tools does not work, you can also use Acrobat to scan documents to make the searchable. When you scan a form normally it comes out as just an image file. Using Acrobat, you can scan and make the documents searchable.

BW