cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with Content Indexing (full text search)

castgroupteam
Champ in-the-making
Champ in-the-making
Hi all,
I have an Alfresco CE 4.0.e (using solr) installed on a production environment.

The problem is that I found out some pdf documents which has no content indexed with the consequence that Full text search, for those documents, is not working.
Im sure it's not a problem of transformation from pdf to text plain because I've tried to re-upload the same document and the context was indexed without problems. Then I've tried to execute the following queries as explained at http://wiki.alfresco.com/wiki/Search#Finding_nodes_by_content :
TEXT:"nint"
TEXT:"nitf"
TEXT:"nicm"
but my documents (with no content indexed) are not returned on the results.

Then I have explored the solr index using Luke and I verified that, on the nodes involved, the content (and also content.__) attribute is not present at all.
How can I do to find out all the documents with no content indexed in the whole repository?
Someone know some issue or bug on the 4.0.e that could have generated this problem?

Really hope someone can help.
Thanks in advance.

Daniele
2 REPLIES 2

mitpatoliya
Star Collaborator
Star Collaborator
This kind of issue generally occures when you have imported large amount of data in alfresco using bulk upload in that case solr take some time for sync up and during that time interval all those documents are non searchable. As you are using CE you will not be able to figure out which are transactions failed during indexing you need to go for re-indexing for solr.
Did you find any other error related to indexing in your solr logs or alfresco logs?

castgroupteam
Champ in-the-making
Champ in-the-making
Hi Mits,
thank you very much for your response.
In the past in this repository we have imported thousands of documents using a bulk upload but this is not the case.
The document I am talking about was imported with a process that involved only 10 documents, so this transaction was relatively small and all the others 9 documents has the content indexed correctly).

With this query FTS: TYPE:"my:customBaseType" AND NOT TEXT:"*" I can find out all the documents without content in the index but this is not useful at all because it give out for example all pdf containing images, and I have thousands in the repository.










Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.