Showing results for 
Search instead for 
Did you mean: 

Searching PDF files

Champ in-the-making
Champ in-the-making
Hi all!!!

After spending a time evaluating some ECM software we choose Alfresco + Liferay for our companies' ECM platform…

At that evaluation period I was working with the TomCat + Alfresco brundled version.. But now I have to the deploy in a production environment, and we don't want to deploy the brundled version… we want to deploy the alfresco.war in a pre-existing TomCat environment…

Everything's work fine, except when we want to search in .pdf files… Alfresco doesn't return anything…

Can anyone tell me what I have to do to get the search working???


Champ in-the-making
Champ in-the-making
The way search works is it takes PDFs and converts them to text files before indexing it through Lucene.  Can you confirm you can do that manually (i.e. through an action)?   

By the way, we don't recommend putting community edition in production - this is what the enterprise version and subscription is for, which is a stabilized code base that is supported on a number of stacks.  If you are interested in the enterprise subscription, contact

Champ in-the-making
Champ in-the-making
Hi jbarmash!!!

Thanx for replying to my post!!!

I have one question … How can I mannualy verify that my PDFs are transformed to text files???

The reason why we're deploying the community version is that, after choosing Alfresco + Liferay as our ECM platform, we're delivering a pilot project, to a little team, for we to get used with Alfresco. We're hoping to acomplish a good experience with it, so we can deliver it to the whole enterprise, (when we're planning to deploy the enterprise version).

For now, my team have to deploy an Alfresco server with basic functionality like Oracle access, LDAP authentication, CIFS… What I didn't expect was to have difficulties with te search functionality…

Can you help me with that???

Champ in-the-making
Champ in-the-making
To manually verify you can convert a pdf to text, upload a PDF, go to the properties of the uploaded document.  Then on the right side, click the "run action" action, and then select "Transform and copy content to a specific space".   Select plain text as your target mime type.

Champ in-the-making
Champ in-the-making
Hi jbarmash !!!

After doing what you wrote, I have a .TXT version of my PDF space in the target space…  But when I click on that, the content of the TXT is something like this:

                                                                                                                                          !"                          #  $        !% & '() *!+& '(), "         -   .     /        &        0          .      !"  0        ! +1234                                     5#            #   &     #   -    #   6         /         +             # 0   .             $#    .      &          7      8  

Does it mean that there's something wrong with the PDF-to-TXT transformation??  What can I do?

Champ in-the-making
Champ in-the-making
Certainly looks like the conversion to text does not work, which might explain why search is not working (it isn't able to index things properly).  I believe that's handled usually through OpenOffice or pdfbox library.      Do you have Openoffice installed?

By the way, the original pdf files were in English, or some other language?

Champ in-the-making
Champ in-the-making
The PDFs texts are in english…

I noticed that it´s happening with some PDF files…

When I create a PDF file throught the PDF Creator printing process everythings OK, ie, the PDF file is correctly indexed and presented in the search result…

I think there's something to do with PDF version…  Do you know somwthing about it?? Does PDFBox works properly with all versions of PDF files or are there any restrictions with this file format???
