I realize this is an overly-broad topic, but maybe if I open a thread we can get some good collaboration going on the situation…
We have about 2.5mm nodes in our repo (virtually all of them are indexed by solr). Many of these are actually folders, so they don't have any true-content per se. The number of documents is probably half that number, the total solr index is roughly 25GB in size, and we're running solr on its own linux server.
What we've found is that browsing the folder-hierarchy via Share is very fast. (I realize solr is really not used for this, but wanted to mention it.) We also get great query response times for "filters" and when we use advanced search (again, in Share) and query against a specific property or two. And those response times are usually sub-second or close, depending on the query/property/etc. Also, we haven't noticed any problems with the indexing time for new/updated content, though I haven't taken detailed measurements.
At a high-level, the problem appears when we use the "simple search" or search "contents"/"keywords" in Share. For this repo, a "reasonable search" (that should – and eventually does – return about 200 documents) takes 10-30 seconds. There is also a roughly 4-5sec lag from the time solr returns the results until Share actually paints them, but we're looking into that issue separately.
Has anyone seen similar issues? Has anyone found any worthwhile performance-tuning parameters for tweaking solr to run queries faster – especially those generated by the simple search and "keywords" in Share?
I appreciate that the Alfresco Team recently updated the Wiki on Solr, but unfortunately it misses the mark for this question, and a bunch of the details apply to 4.2.x (I'm using 4.0.e on linux).
Thanks in advance,
AJ