cancel
Showing results for 
Search instead for 
Did you mean: 

solr performance issues

aweber1nj
Champ in-the-making
Champ in-the-making
I realize this is an overly-broad topic, but maybe if I open a thread we can get some good collaboration going on the situation…

We have about 2.5mm nodes in our repo (virtually all of them are indexed by solr).  Many of these are actually folders, so they don't have any true-content per se.  The number of documents is probably half that number, the total solr index is roughly 25GB in size, and we're running solr on its own linux server.

What we've found is that browsing the folder-hierarchy via Share is very fast.  (I realize solr is really not used for this, but wanted to mention it.)  We also get great query response times for "filters" and when we use advanced search (again, in Share) and query against a specific property or two.  And those response times are usually sub-second or close, depending on the query/property/etc.  Also, we haven't noticed any problems with the indexing time for new/updated content, though I haven't taken detailed measurements.

At a high-level, the problem appears when we use the "simple search" or search "contents"/"keywords" in Share.  For this repo, a "reasonable search" (that should – and eventually does – return about 200 documents) takes 10-30 seconds.  There is also a roughly 4-5sec lag from the time solr returns the results until Share actually paints them, but we're looking into that issue separately.

Has anyone seen similar issues?  Has anyone found any worthwhile performance-tuning parameters for tweaking solr to run queries faster – especially those generated by the simple search and "keywords" in Share?

I appreciate that the Alfresco Team recently updated the Wiki on Solr, but unfortunately it misses the mark for this question, and a bunch of the details apply to 4.2.x (I'm using 4.0.e on linux).

Thanks in advance,
AJ
6 REPLIES 6

andy
Champ on-the-rise
Champ on-the-rise
Hi

What version of Alfresco are you using?
Are all queries the same?
Have you checked you are really using SOLR? (Does the SOLR admin statistics page show any calls made to the query handlers?)
What search examples cause the problem?
Is it the same for site specific, all site and repository scoped queries?

Andy

aweber1nj
Champ in-the-making
Champ in-the-making
Thanks for the reply, Andy…

Alfresco 4.0.e on CentOS6 (64bit), 4-cores, 12GB RAM (and that host is only running Solr, so it can technically use anything it wants).  Alfresco host is same hw/sw specs.

Not all queries are the same.  I try to search for a single word that I know is in roughly 100-200 nodes (folders + documents).  However, re-running the same query (over time, not back-to-back) yields almost identical times.  We are getting our timing from:
log4j.logger.org.alfresco.repo.search.impl.solr.SolrQueryHTTPClient=debug
And we are testing these when no one else is on the system (and one query at a time).  Thus, we're almost positive these queries are going to Solr, and the Solr Admin Page does show cache statistics being updated.

As I think I mentioned, searches against a single, custom property that we have indexed returns very fast.  It's when we try putting a value in the "Keywords" box in Advanced Search (even with additional properties), or when we run a "simple search" from the document library view of Share that it completely drags.

We only have one Site in use at this time, and we're using Share to run the tests (except when we try to copy the alfresco-fts text to the Node Browser).  I honestly don't know how to re-scope the queries.  I believe Share is always defaulting to search the Site, but that's only a SWAG.  We never explicitly specify either-way (but could, if you let me know how and you think it'll help troubleshoot).

Thanks again for your reply.  I hope this is something we can figure out and help everyone with.  Solr should be able to run these kinds of queries against this "library" in sub-second speed, but I'm not even looking for that.

Again, I will be happy to provide any further details and statistics if you think you can help!

-AJ

andy
Champ on-the-rise
Champ on-the-rise
Hi

What is the the query you enter into the search box?
Have you customized the share search string?
Have you added any custom models?

My initial guess is an issue that was fixed when generating cross language search strings as used for the old lucene impl against SOLR.
This was building a large wildcard expression for a locale part for the token with a scan across all terms of content.

You could upgrade to 4.2 to fix this (or use enterprise)
There is no way to avoid this expansion in the config.

You could confirm this is the cause of your problems with a couple of stack dumps if they show the query in wild card expansion/term enumeration.

See     ALF-15491  SOLR is generating queries for lucene style cross-language support

Andy

aweber1nj
Champ in-the-making
Champ in-the-making
I PM'ed you earlier with some log entries that should indicate whether or not the query is of the "wrong flavour".

Yes, we have a custom model, and we did add three (I think) of the indexed properties to the search-box-string.

Is it possible for you to take a peek at the PM I sent and see if you recognize whether we are running into ALF-15491 as you theorize?

Thanks again,
AJ

aweber1nj
Champ in-the-making
Champ in-the-making
Andy,

We tried some additional "self help" and plunged into some of the code (of which it appears you are the author in many cases).  We tried installing the 4.2 Solr standalone stack on our Solr host, but it would not run against the 4.0.e Alfresco install.  We also tried copying the 4.2 webscripts to the 4.0 Alfresco installation (in hopes that the 4.2 Solr install would then be able to track against the 4.0 install), but this didn't work either.

We are "snookered" at this point, as we've exhausted all our ideas to workaround this bug.

Do you have any ideas that we could test to upgrade just the solr piece of the pie?

Thanks again for the help,
AJ

andy
Champ on-the-rise
Champ on-the-rise
Hi

I believe you are hitting the issue I mentioned.
The only option you have is to upgrade to 4.2 or go though support for the fix.

SOLR and Alfresco releases are normally in sync - bug fixes can affect both sides - even if they do not affect the API they use to communicate.

Andy

Andy