Hyland Connect

dirko · ‎07-15-2012

Hello,

I've been investigating a problem we experience in a {Alfresco 4, lucene, alternative contentStore} environment and which was not occurring in 3.2. We are using an alternative contentStore (castorContentStore, cfr. http://forge.alfresco.com/gf/project/alfresco2castor/). To understand the issue one must know that the alfresco2castor interface uses a pool of connections to store and retrieve documents. The storing and retrieving of these documents is fine as long as lucene (for the purpose of content indexing) is not retrieving documents . When lucene retrieves a document it is NOT consuming the content of the document stream, resulting in ContentStreamListener.contentStreamClosed() never getting called and the connection never being freed. This was observed by enabling debug logging on the HttpClient and adding logging statements to the implementation of contentStreamClosed. I can configure the castor pool to be a bit larger, resulting in the pool being exhausted (by lucene) just a little bit later.

FYI, the problem is NOT observed in the following environments:
- a {Alfresco 3.1/3.2, lucene, alternative contentStore) environment
- a {Alfresco 4, solr, alternative contentStore) environment
- when content indexing is disabled in a {Alfresco 4, lucene, alternative contentStore} environment

So I would like to check with the alfresco/lucene experts whether the above behaviour rings any bell? I've been looking at the alfresco lucene code and I do see some significant changes, but I would like to poll the community on any insight before i further spend/loose time and dive into the code.

cheers,
dirk

dirko · ‎08-16-2012

I further investigated the problem I described in my earlier post.

It's not a resource leak, but the lucene indexer is using much more 'contentstore resources' since alfresco 3.4.

Let me explain this a bit:
1. content in a contentStore is read via an implementation of ContentStore.getReader()
2. the access to this content(stream) is released via an implementation of ContentStreamListener.contentStreamClosed()

In earlier versions these 2 methods were called after eachother: contentStreamClosed() was called before the next getReader() was called.

This is not the case anymore, in ADMLucenIndexerImpl.flushPending(), getReader() is called via readDocuments(), while contentStreamClosed() is called via writer.addDocument(doc) which is in a loop over all documents (in a batch). So this implementation keeps multiple streams opened (~ size of lucene.indexer.batchSize).

In the default contentStore (file system) this is probably unnoted because the os provides a huge number of file handles. In our non-default contentstore (castorContentStore) however we use http connections to access the contentstore and we only configured a pool with a limited number of connections (much smaller than the batchSize). So we are able to solve the issue by tuning the castor connection pool to the lucene batch size.

This leaves me with 2 questions:

1. does the above analysis make sense?

2. is there a real reason to have these concurrent open streams or was this introduced 'by accident'?

dward · ‎09-06-2012

Hi Dirk

I think you have a point

I have logged https://issues.alfresco.com/jira/browse/ALF-15857

The indexer was recently reorganized so that it 'batched' together all index reads in readDocuments(), before writing to the index in a single go. This was to avoid the context switching involved in changing between writer and reader and all the flushing involved.

However, it looks like we then introduced the problem you describe. As the documents to be added to the index retain an open reader until they are consumed and closed. So we were simultaneously opening streams for all documents in the transaction.

I think I will be able to work around this by wrapping the Reader with a 'lazy' one that only triggers the opening of the content input stream when it is first accessed.

I'll check in an experimental fix and let you know how it goes.

Thanks for bringing this to our attention.

Dave

Hyland Connect

Resource leak when lucene indexes (alt.) contentStore?