cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to do full text searches on PDFs

jclacherty
Champ in-the-making
Champ in-the-making
Hi,

I've just installed Alfresco Labs 3b for testing and have uploaded a PDF file to a document library.  Does share have full text search capabilities and should it extend to PDFs?  When I search for something that is in the PDF I don't get any results.

Justin.
15 REPLIES 15

jclacherty
Champ in-the-making
Champ in-the-making
I tried in the Alfresco application as well (http:\\mysite\alfresco) but it didn't work there either.  Is there something I need to do to enable full text searching?

Justin.

jclacherty
Champ in-the-making
Champ in-the-making
Well Labs 3c doesn't work either…Wonder if the enterprise version does…

jclacherty
Champ in-the-making
Champ in-the-making
No luck with Enterprise 3.0 either.  Doesn't work with either PDFs or Word 2007 files, only with straight text files which isn't overly useful.  Back to Sharepoint I guess.

Justin.

mikeh
Star Contributor
Star Contributor
Alfresco relies on third party libraries for content transformations: http://wiki.alfresco.com/wiki/Content_Transformations
For Office files, you need to have OpenOffice installed to do the conversion.

Not sure why your PDF indexing isn't working - it should be handled by pdfbox. Are there any error relating to transformation in the log?

Mike

jclacherty
Champ in-the-making
Champ in-the-making
Hi Mike,

Thanks for the response.  I wasn't expecting that Word documents would work without some effort, but I thought PDFs should work out of the box.  The enterprise version seems to have installed OpenOffice for me do I still need to do something for it to do content transformations on Word docs? 

I've looked in alfresco.log and don't see any errors in there.  I've enabled the two logging categories mentioned in the link you posted, is there something else I need to enable to track this down?

Justin.

ivo_costa
Champ in-the-making
Champ in-the-making
Hi jclacherty

the pdf file you're using might not have the text layer, and so although you can read it your-self, you're actually looking at a photograph (I've read somewhere that this is how PDFs work)

I guess that some pdf generators just do this the wrong way…

I have a few pdfs that work and a few others that don't work at all, you could try to find an pdfocr program to check if this is the problem

Regards

jclacherty
Champ in-the-making
Champ in-the-making
Hi Ivo,

I used Acrobat and had it OCR the file when I scanned it.  I can do text searches from within Acrobat so I assume there's a text layer.

Justin.

ivo_costa
Champ in-the-making
Champ in-the-making
Hi jclacherty

just tested it once again and it's working for me with Alfresco 3.0E

are you searching for a full word?? if you're searching for a partial you must add "*" to the search string
for my test I just created a document using OpenOffice and exported it to PDF using the built-in PDF creator

Can you run a few tests and help the community give something back to Alfresco??? Smiley Wink

Best Regards


edit: on the word issue, I think that OpenOffice still doesn't support the docx format, but doc (xp/2003) should be fine

jclacherty
Champ in-the-making
Champ in-the-making
Hi Ivo,

Happy to do testing but I do need direction as I'm unfamiliar with Alfresco.  I have been searching for a full word.  The document was created by Acrobat 9, perhaps it's a versioning problem.  I'll try to create another one but save it as an earlier pdf version.

And thanks on the Word update.

Justin.
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.