03-18-2020 12:10 PM
Hello,
I am stuck with some weird results on Alfresco FTS search queries using Solr 6 and my custom alfresco model with d:text properties defined with Indexing Free Text.
Looking at my alfresco model the property "edm:dxuid" is defined as:
<property name="edm:dxuid"> <index enabled="true"> <tokenised>TRUE</tokenised> <facetable>false</facetable> </index> </property>
In the Alfresco Solr schema this maps to the dynamic field "text@s__lt@":
"text@s__lt@{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}dxuid" <fieldType name="alfrescoFieldType" class="org.alfresco.solr.AlfrescoFieldType" /> <dynamicField name="text@s__lt@*" type="alfrescoFieldType" indexed="true" omitNorms="false" stored="false" multiValued="false" />
I have no clue how the alfrescoFieldType tokenises my values and when I try to run the analyzer on Solr Admin Console with alfrescoFieldType I get a NullPointerException:
2020-03-18 15:07:41.666 INFO (qtp1151020327-15) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/analysis/field params={analysis.fieldvalue=DX01&analysis.showmatch=true&wt=json&analysis.fieldtype=alfrescoFieldType&_=1584544061607} status=500 QTime=9 2020-03-18 15:07:41.670 ERROR (qtp1151020327-15) [ x:alfresco] o.a.s.s.HttpSolrCall null:java.lang.NullPointerException at org.alfresco.solr.AlfrescoAnalyzerWrapper.getWrappedAnalyzer(AlfrescoAnalyzerWrapper.java:76)
My Solr core is created using the rerank solr template and I am not sure if this could have an impact on my unexpected behavior or not.
I can reproduce the same wrong behavior with
Alfresco Community 5.2.0 (re21f2be5-b22) / Alfresco Search Services 1.1.0 (Solr 6.3.0) 1.2.2 (Solr 6.6.0) 1.3.0.9 (Solr 6.6.5)
All my documents are indexing metadata only (content indexing is disabled)
When I use the current model with edm:dxuid defined as Free Text in Alfresco I get the following results:
e.g. my property values are of the form edm:dxuidX01_099_20200117_04994145
qt: /afts q: edm:dxuid:* Hits=4751 <--- CORRECT VALUE !!! 2020-03-18 15:13:26.853 DEBUG (qtp1151020327-16) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/afts params={q=edm:dxuid:*&indent=true&wt=json&_=1584544406795} 2020-03-18 15:13:26.855 DEBUG (qtp1151020327-16) [ x:alfresco] o.a.s.q.AbstractQParser AFTS QP query as lucene: (PROPERTIES:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}dxuid)^1.0 2020-03-18 15:13:26.856 DEBUG (qtp1151020327-16) [ x:alfresco] o.a.s.q.AbstractQParser AFTS QP query as lucene: (content@s__lt@{http://www.alfresco.org/model/content/1.0}content:{en}rerank_query_from_context)^1.0 2020-03-18 15:13:26.856 DEBUG (qtp1151020327-16) [ x:alfresco] o.a.s.h.c.QueryComponent process: carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544406795&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3} 2020-03-18 15:13:26.856 DEBUG (qtp1151020327-16) [ x:alfresco] o.a.s.s.s.LocalStatsCache ## GET {carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544406795&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3}} 2020-03-18 15:13:26.856 INFO (qtp1151020327-16) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/afts params={q=edm:dxuid:*&indent=true&wt=json&_=1584544406795} hits=4751 status=0 QTime=3 2020-03-18 15:13:26.857 DEBUG (qtp1151020327-16) [ x:alfresco] o.a.s.s.HttpSolrCall Closing out SolrRequest: {carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544406795&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3}} =-=-=-=-=-=-=-= qt: /afts q: edm:dxuid:*DX01* Hits: 1000 <--- WRONG VALUE !!! EXPECTED VALUE = 4751 !!! 2020-03-18 15:15:49.346 DEBUG (qtp1151020327-20) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/afts params={q=edm:dxuid:*DX01*&indent=true&wt=json&_=1584544549264} 2020-03-18 15:15:49.351 DEBUG (qtp1151020327-20) [ x:alfresco] o.a.s.q.AbstractQParser AFTS QP query as lucene: (_dummy_:*DX01* text@s__lt@{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}dxuid:{en}*dx01*)^1.0 2020-03-18 15:15:49.352 DEBUG (qtp1151020327-20) [ x:alfresco] o.a.s.q.AbstractQParser AFTS QP query as lucene: (content@s__lt@{http://www.alfresco.org/model/content/1.0}content:{en}rerank_query_from_context)^1.0 2020-03-18 15:15:49.352 DEBUG (qtp1151020327-20) [ x:alfresco] o.a.s.h.c.QueryComponent process: carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544549264&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3} 2020-03-18 15:15:49.353 DEBUG (qtp1151020327-20) [ x:alfresco] o.a.s.s.s.LocalStatsCache ## GET {carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544549264&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3}} 2020-03-18 15:15:49.359 INFO (qtp1151020327-20) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/afts params={q=edm:dxuid:*DX01*&indent=true&wt=json&_=1584544549264} hits=1000 status=0 QTime=13 2020-03-18 15:15:49.360 DEBUG (qtp1151020327-20) [ x:alfresco] o.a.s.s.HttpSolrCall Closing out SolrRequest: {carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544549264&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3}} =-=-=-=-=-=-=-= qt: /afts q: edm:dxuid:*DX01_0* Hits: 1000 <--- WRONG VALUE !!! EXPECTED VALUE = 3113 !!! 2020-03-18 15:19:40.475 DEBUG (qtp1151020327-11) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/afts params={q=edm:dxuid:*DX01_0*&indent=true&wt=json&_=1584544780434} 2020-03-18 15:19:40.479 DEBUG (qtp1151020327-11) [ x:alfresco] o.a.s.q.AbstractQParser AFTS QP query as lucene: (_dummy_:*DX01_0* text@s__lt@{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}dxuid:{en}*dx01_0*)^1.0 2020-03-18 15:19:40.604 DEBUG (qtp1151020327-11) [ x:alfresco] o.a.s.q.AbstractQParser AFTS QP query as lucene: (content@s__lt@{http://www.alfresco.org/model/content/1.0}content:{en}rerank_query_from_context)^1.0 2020-03-18 15:19:40.605 DEBUG (qtp1151020327-11) [ x:alfresco] o.a.s.h.c.QueryComponent process: carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01_0*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544780434&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3} 2020-03-18 15:19:40.605 DEBUG (qtp1151020327-11) [ x:alfresco] o.a.s.s.s.LocalStatsCache ## GET {carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01_0*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544780434&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3}} 2020-03-18 15:19:40.605 INFO (qtp1151020327-11) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/afts params={q=edm:dxuid:*DX01_0*&indent=true&wt=json&_=1584544780434} hits=1000 status=0 QTime=130 2020-03-18 15:19:40.606 DEBUG (qtp1151020327-11) [ x:alfresco] o.a.s.s.HttpSolrCall Closing out SolrRequest: {carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01_0*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544780434&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3}} =-=-=-=-=-=-=-= qt: /afts q: edm:dxuid:*DX01_09* Hits=740 <--- CORRECT VALUE !!! 2020-03-18 15:22:58.176 DEBUG (qtp1151020327-14) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/afts params={q=edm:dxuid:*DX01_09*&indent=true&wt=json&_=1584544978131} 2020-03-18 15:22:58.182 DEBUG (qtp1151020327-14) [ x:alfresco] o.a.s.q.AbstractQParser AFTS QP query as lucene: (_dummy_:*DX01_09* text@s__lt@{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}dxuid:{en}*dx01_09*)^1.0 2020-03-18 15:22:58.190 DEBUG (qtp1151020327-14) [ x:alfresco] o.a.s.q.AbstractQParser AFTS QP query as lucene: (content@s__lt@{http://www.alfresco.org/model/content/1.0}content:{en}rerank_query_from_context)^1.0 2020-03-18 15:22:58.190 DEBUG (qtp1151020327-14) [ x:alfresco] o.a.s.h.c.QueryComponent process: carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01_09*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544978131&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3} 2020-03-18 15:22:58.190 DEBUG (qtp1151020327-14) [ x:alfresco] o.a.s.s.s.LocalStatsCache ## GET {carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01_09*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544978131&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3}} 2020-03-18 15:22:58.191 INFO (qtp1151020327-14) [ x:alfresco] o.a.s.c.S.Request [alfresco] webapp=/solr path=/afts params={q=edm:dxuid:*DX01_09*&indent=true&wt=json&_=1584544978131} hits=740 status=0 QTime=14 2020-03-18 15:22:58.191 DEBUG (qtp1151020327-14) [ x:alfresco] o.a.s.s.HttpSolrCall Closing out SolrRequest: {carrot.url=id&spellcheck.collateExtendedResults=true&indent=true&carrot.produceSummary=true&spellcheck.maxCollations=3&spellcheck.maxCollationTries=5&spellcheck.alternativeTermCount=2&spellcheck.extendedResults=false&hl.qparser=rrafts&q=edm:dxuid:*DX01_09*&defType=afts&spellcheck.maxResultsForSuggest=5&rqq={!rrafts}RERANK_QUERY_FROM_CONTEXT&spellcheck=false&carrot.outputSubClusters=false&spellcheck.count=5&wt=json&carrot.title=mltext@m___t@{http://www.alfresco.org/model/content/1.0}title&carrot.snippet=content@s___t@{http://www.alfresco.org/model/content/1.0}content&_=1584544978131&spellcheck.collate=true&rq={!alfrescoReRank+reRankQuery%3D$rqq+reRankDocs%3D500+scale%3Dtrue+reRankWeight%3D3}}
Has anyone in this forum encountered the same issues on d:text and Indexing Free Text custom model fields?
Alfresco FTS Solr searches are not behaving as I would expect and I would like to know if I am doing something wrong on my side or not.
Any help would be much appreciated as I am running out of ideas here.
Thanks in advance,
Luis
03-19-2020 04:00 AM
Hope this helps:
https://angelborroy.wordpress.com/2018/05/30/alfresco-counting-more-than-1000-elements/
03-19-2020 07:47 AM
Hola Angel,
Thanks a lot for your quick response.
This is just a small test environment with 1782 documents in the content store so I am afraid the issue I am encountering is not related to the 1000 document limitation on search.
Please look at some of my unexpected results when running queries directly in Solr Admin Console:
(I am assuming all these queries are well executed on SOLR and not on DB, right?)
Query 1 : SUCCEEDED : edm:dxuid:*
"q":"(PATH:\"/app:company_home/cm:archive/cm:_x0030_01G_COMPANY-CLIENT2//*\") AND (TYPE:\"edm:invoice\") AND (ISUNSET:\"edm:archiveDate\" OR ISNULL:\"edm:archiveDate\" OR edm:archiveDate:[\"2020-03-19\" TO \"MAX\"]) AND edm:processDocumentFormatType:(ORIGINAL OR TARGET) AND edm:dxuid:*"
"response":{"numFound":148,"start":0,"docs":[...]}
Query 2 : FAILED : edm:dxuid:*DX01_0*
"q":"(PATH:\"/app:company_home/cm:archive/cm:_x0030_01G_COMPANY-CLIENT2//*\") AND (TYPE:\"edm:invoice\") AND (ISUNSET:\"edm:archiveDate\" OR ISNULL:\"edm:archiveDate\" OR edm:archiveDate:[\"2020-03-19\" TO \"MAX\"]) AND edm:processDocumentFormatType:(ORIGINAL OR TARGET) AND edm:dxuid:*DX01_0*"
"response":{"numFound":0,"start":0,"docs":[]}
Query 3 : SUCCEEDED : edm:dxuid:*DX01_09*
"q":"(PATH:\"/app:company_home/cm:archive/cm:_x0030_01G_COMPANY-CLIENT2//*\") AND (TYPE:\"edm:invoice\") AND (ISUNSET:\"edm:archiveDate\" OR ISNULL:\"edm:archiveDate\" OR edm:archiveDate:[\"2020-03-19\" TO \"MAX\"]) AND edm:processDocumentFormatType:(ORIGINAL OR TARGET) AND edm:dxuid:*DX01_09*"
"response":{"numFound":148,"start":0,"docs":[...]}
Does it ring any bells?
Please let me know if you need more detailed info about my alfresco model.
Thanks in advance,
Luis
03-24-2020 01:28 PM
A quick update on this issue.
The problem is still not solved on my setup but after debugging a bit more on it it looks like it could be related to the alfresco rerank, score, weight and cost searching capabilities.
Looking at the logs and the source code there seems to be a mainQuery (executed first) and a reRankQuery (executed later) as well as some reRankDocs and reRankWeight values.
"parsedquery":"ReRankQuery({!rerank mainQuery='(+(CACHED -> :PATH:/http://www.alfresco.org/model/application/1.0:company_home/http://www.alfresco.org/model/content/1.0:archive/http://www.alfresco.org/model/content/1.0:_x0030_01G_COMPANY-CLIENT2///*)^1.0 +(TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}invoice)^1.0 +((+(TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}recadv TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}delfor TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}feedbackMessage TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}cntcnd TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}catlog TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}invoice TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}attachment TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}order TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}despatchAdvice TYPE:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}dxDocument) -PROPERTIES:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}archiveDate)^1.0 (NULLPROPERTIES:{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}archiveDate)^1.0 (date@s_@{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}archiveDate:[1585094400000 TO 9223058350344848383])^1.0)^1.0 +((_dummy_:ORIGINAL text@s__lt@{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}processDocumentFormatType:{en}origin)^1.0 (_dummy_:TARGET text@s__lt@{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}processDocumentFormatType:{en}target)^1.0)^1.0 +(_dummy_:*DX01_* text@s__lt@{http://www.doc-process.com/model/eArchiveDocumentModel/1.0}dxuid:{en}*dx01_*)^1.0)^1.0' reRankQuery='((_dummy_:RERANK_QUERY_FROM_CONTEXT mltext@m__lt@{http://www.alfresco.org/model/content/1.0}description:{en}rerank_query_from_context) (_dummy_:RERANK_QUERY_FROM_CONTEXT mltext@m__lt@{http://www.alfresco.org/model/content/1.0}title:{en}rerank_query_from_context) (content@s__lt@{http://www.alfresco.org/model/content/1.0}content:{en}rerank_query_from_context) (spanOr([text@s___t@{http://www.alfresco.org/model/content/1.0}name:rerank_query_from_context, spanNear([text@s___t@{http://www.alfresco.org/model/content/1.0}name:rerank, text@s___t@{http://www.alfresco.org/model/content/1.0}name:query, text@s___t@{http://www.alfresco.org/model/content/1.0}name:from, text@s___t@{http://www.alfresco.org/model/content/1.0}name:context], 0, true), text@s___t@{http://www.alfresco.org/model/content/1.0}name:rerankqueryfromcontext]) text@s____@{http://www.alfresco.org/model/content/1.0}name:RERANK_QUERY_FROM_CONTEXT text@s__lt@{http://www.alfresco.org/model/content/1.0}name:{en}rerank_query_from_context text@s__l_@{http://www.alfresco.org/model/content/1.0}name:{en}RERANK_QUERY_FROM_CONTEXT))^1.0' reRankDocs=500 reRankWeigh=3.0})"
It looks like my query is getting a wrong number of hits based on cost and as such the number of documents returned by the query is lower than expected and wondering if it could be related to the reranking feature.
Has anyone in this forum experienced the same issues with Alfresco Search Services rerank template?
I can see Alfresco Search Services 1.4 contains two templates:
noRerank
rerank
The default one when creating the alfresco and archive solr cores is 'rerank'.
Does anyone know if there's a way to disable in Solr/Lucene these reranking/scoring features such that just the values returned by the mainQuery can be returned "as is" without applying any further rerankQuery on them that could filter out expected values?
Thanks in advance,
Luis
03-26-2020 07:29 PM
FYI
I have finally overcome my issue with the wrong number of documents returned by the search query after increasing the values of the properties listed below:
/opt/alfresco-search-services/solrhome/alfresco/conf/solrcore.properties
solr.maxBooleanClauses=1000000
alfresco.topTermSpanRewriteLimit=1000000
However, there's a drawback as I am facing now some query performance issues as the queries are slow and eating a lot of heap space.
I am assuming I could increase CPU and heap memory size on the JVM to improve this but my physical resources are not unlimited.
Has anyone faced the same issues?
Is anyone aware of any alfresco solr performance tuning guide that explains all the different properties in the solrcore.properties file and how to configure them for alfresco setups with several millions documents indexed?
Thanks in advance.
Luis
03-27-2020 04:08 AM
I don't see the benefit on using 'maxBooleanClauses' and TopTermsSpanBooleanQueryRewrite, I guess you are not producing SOLR queries with more than 1,000 operands.
Are you using the queries you described above?
04-05-2020 05:56 AM
I decided to increase those properties based on Solr Guide recommendations:
https://lucene.apache.org/solr/guide/6_6/other-parsers.html
Limitations
Performance is sensitive to the number of unique terms that are associated with a pattern. For instance, searching for "a*" will form a large OR clause (technically a SpanOr with many terms) for all of the terms in your index for the indicated field that start with the single letter 'a'. It may be prudent to restrict wildcards to at least two or preferably three letters as a prefix. Allowing very short prefixes may result in to many low-quality documents being returned.
Notice that it also supports leading wildcards "*a" as well with consequent performance implications. Applying ReversedWildcardFilterFactory in index-time analysis is usually a good idea.
You may need to increase MaxBooleanClauses in solrconfig.xml as a result of the term expansion above:
<maxBooleanClauses>4096</maxBooleanClauses>
Explore our Alfresco products with the links below. Use labels to filter content by product module.