cancel
Showing results for 
Search instead for 
Did you mean: 

Get Indexed Content of Lucene and display in search results

tellme
Champ in-the-making
Champ in-the-making
Hello,

I would be interested to know how can we get the the actual index content of a document and display some of it in the search Results particularly for PDF, Powerpoint etc. We are using Enterprise 3.4

At present I can upload a PDF document to the Document Library of the Share and can search inside the content of the uploaded PDF document. However, I want to fetch the content as well to display in the search content. I need to display some 2 or 3 lines as we get when we do the google search.

I tried altering the Search.lib.js and the getDocumentItem() method to fetch the content. It does work for the text files but for PDF it shows junk characters. I understand that PDF is a binary file but is there a way we could get the content displayed in search results?

Thanks
5 REPLIES 5

andy
Champ on-the-rise
Champ on-the-rise
Hi

You can use the transformation services to turn a document into another form.
You could create another rendition for highlighting the doc this way (linked by some association).
You will have to do the highlighting yourself.

Andy

tellme
Champ in-the-making
Champ in-the-making
Thanks Andy for your reply.

I created a rule in the space to convert any pdf document to text and it works. but when I search for a keyword both the pdf and text documents appear.

Could you explain a bit more about associations etc?

Thanks

lwoodson
Champ in-the-making
Champ in-the-making
I have the exact same use case.  We deal with several standards specifications, and then a bunch of manufacturer specific implementations of these standards.  All are documented in pdfs.  We need to be able to search across all pdfs and display snippets around term matches in one easily-scannable results page.

Tellme, did you ever figure out your issues?  Or, if anyone else wants to chime in with other ideas or how this might be handled with Alfresco, I'd appreciate it.  I'm currently evaluating which tools might be best suited for this job.

Thanks,

-Lance Woodson

andy
Champ on-the-rise
Champ on-the-rise
Hi

It will take quite a bit of customization to add this support - you are going to have to change the UI display and add your own highlighted stuff going direct to lucene.

Support for this is on the TODO list and will most likely happen only with SOLR.
This will mean your SOLR server will have to store the data you want to highlight and snippets will come back as part of the result set.
At that point we will integrate it with the UI and you could ask in which fields to do highlighting …. name, title, description etc.
The model would have optoins to support this for properties.

At the moment, I would recommend you wait for this - unless you want to get into lots of customisation to wrap the display of search results and go and do the highlighting yourself.

The first step to doing this is having the text of the content available to highlight. You could use lucene to get the info to highlighta s we store enough information by default - again you would have to do some low level stuff to access this via lucene -  and then change the UI. The lucene in action book covers highlighting with lucene.

Andy

afaust
Legendary Innovator
Legendary Innovator
Hello,

we have recently implemented transparent highlighting in the Alfresco Search API for a Proof of Concept implementation based on Lucene. The level of customization required is actually (relatively) limited for the basic highlighting that we have implemented. We have not done any customization on the UI though as the customer intends to use a Liferay portal based UI.

We are in contact with local Alfresco consultants to contribute patches from the PoC back to the Community. The project the PoC was a preliminary stage for is probably going to be based on Alfresco 4.0 if the customer decides to go ahead with the product, and may thus require us to provide highlighting based on SOLR. The customer clearly intends to limit its maintenance costs by passing such feature enhancements back to Alfresco, so we may be able to provide a similar enhancement in the beginning of 2012 (if not sooner due to other projects).

Regards