cancel
Showing results for 
Search instead for 
Did you mean: 

searchService finds deleted content

iblanco
Confirmed Champ
Confirmed Champ
I was having trouble with a scheduled job due to the fact that the "fts-alfresco" search I was running was returning some node references to nodes that are already deleted. I thought that might be some kind of corruption on the Lucene index because fully rebuilding the index did solve the problem. But I managed to reproduce the problem a couple of times by copying and deleting some folders so I started digging trying to figure where was I breaking the index.

But I was shocked when I executed the same query in Alfresco Share's search box and found that it didn't return the unexistent nodes, so something was wrong with my searchParameters or something like that I thought. I launched the debugger and analyzing the search that was being executed in the backend when I executed the query in Share showed that this query did also return unexisting nodes but afterwards org.alfresco.repo.javascript.Search was filtering the result and discarding unexistent ones.

That means that if you call the searchService with a query limited to for example 50 results and 20 of them are non existing you will get only 30 results. It seems like in Share this is solved by setting the limit to 502 while the real search result never returns more than 200 elements. Am I seeing and understanding it right ? That means that if by a chance there are more than 302 deleted nodes as a result of a search you might lost some results, isn't it ?

Is this the expected behaviour or am I missing something ? And if this is the expected behaviour, when are the non existing nodes really deleted from the index ? On merge time ?

To be honest I'm quite confused about this.
5 REPLIES 5

afaust
Legendary Innovator
Legendary Innovator
Hello,

one thing to keep in mind is that Lucene (and SOLR in 4.x) can be out-of-synch with the real state of the database, depending on how they are set up (tracking / index update intervals) and how long it takes to update the index for one specific set of changes. The search with fts-alfresco on workspace://SpacesStore should normally not return any deleted nodes, but may contain some when the deletion just occured and the indexer did not yet have a chance to remove them from the index. Since this can occur (and will likely do so sometime in a high load situations), you should always take care to check your search result nodes for existence (as an extension programmer), before performing any operations on them.

If your issue is consistent, reproducible and deleted nodes are retained in the index longer than the default 5-15 seconds between indexer runs, it would be worth looking into your use case and your configuration.

Generally speaking, which version of Alfresco are you using / does your issue occur on?

Regards
Axel

iblanco
Confirmed Champ
Confirmed Champ
Thank you AFaust for you comments.

I'm using Alfresco Community 4.0.d and Lucene, not SOLR. The nodes remain in this state for quite longer thant 15-20 seconds, for hours, well in fact I haven't seen them going back to a consistent state until I stopped and did a full reindex. In fact not even restarting the repository solved the problem if there wasn't a full reindex.

Those "ghost nodes" usually appear when I delete a folder where they are, not if I directly delete them. So it seems to me like I'm having some trouble with the logic in Alfresco that should crawl the descendants of a deleted folder. Those folders are created programatically so I might have some kind of error in the creation process that makes those folders different from regular ones.

If having deleted nodes persist longer thant 15-20 seconds is not an expected behaviour I'll have to check it more thoroughly with the debugger to see what's happening while deleting.

Thanks.

iblanco
Confirmed Champ
Confirmed Champ
Well, I'm still fighting the issue. I've managed to reproduce it in a quite easy way and I've reported it as a bug: https://issues.alfresco.com/jira/browse/ALF-17860

I suspect that this might be a misunderstanding of the process of index updating in alfresco and maybe not a real bug but I don't know how to manage it in a reasonable way and I expect that the answer to the bug might seed some light into what I'm doing wrong. We will see.

andy
Champ on-the-rise
Champ on-the-rise
Can you reproduce this on 4.2?

Andy

iblanco
Confirmed Champ
Confirmed Champ
It doesn't happen in 4.2, Derek Hulley paid attention to the bug report and asked the same question. It seems like something similar used to happen with 3.4.6 but it was fixed in 3.4.9 .