<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: no highlight results for phrase search in Alfresco Forum</title>
    <link>https://connect.hyland.com/t5/alfresco-forum/no-highlight-results-for-phrase-search/m-p/485133#M39690</link>
    <description>&lt;P&gt;I'm clueless on that one too, but I've been working on it and here are my conclusions so far :&lt;BR /&gt;&lt;BR /&gt;I'm not sure how SearchServices uses the parameters of the highlighter (in solrconfig.xml or at query time) especially the hl.usePhraseHighlighter (true by default it can be overwritted at query time but don't seem to change anything) :&amp;nbsp;&lt;BR /&gt;&lt;A href="https://solr.apache.org/guide/6_6/highlighting.html" target="_self"&gt;https://solr.apache.org/guide/6_6/highlighting.html&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;In my opinion, it has something to do with&amp;nbsp;the way that solr is used by Alfresco Search Services to tokenize the content of your document it interprets it by splitting it into words and then do some filtering (like removing 's and link words).&lt;BR /&gt;&lt;BR /&gt;If you look into the schema.xml you'll find the configuration it uses :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"&amp;gt;
&amp;lt;analyzer type="index"&amp;gt;
...
&amp;lt;tokenizer class="solr.StandardTokenizerFactory"/&amp;gt;
&amp;lt;!-- in this example, we will only use synonyms at query time
&amp;lt;filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/&amp;gt;
--&amp;gt;
&amp;lt;filter class="solr.ICUFoldingFilterFactory"/&amp;gt;
&amp;lt;filter class="solr.EnglishPossessiveFilterFactory"/&amp;gt;
&amp;lt;filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/&amp;gt;
&amp;lt;!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
&amp;lt;filter class="solr.EnglishMinimalStemFilterFactory"/&amp;gt;
--&amp;gt;
&amp;lt;filter class="solr.PorterStemFilterFactory"/&amp;gt;
&amp;lt;filter class="solr.CommonGramsFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/&amp;gt;
&amp;lt;/analyzer&amp;gt;
&amp;lt;analyzer type="query"&amp;gt;
...
&amp;lt;/analyzer&amp;gt;
&amp;lt;/fieldType&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have noticed that the highlighted text has its own tokenisation/filter. Hence, I'm more and more suspecting a misconfiguration of either one of them :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;    &amp;lt;fieldType name="highlighted_text_en" class="solr.TextField"&amp;gt;
      &amp;lt;analyzer&amp;gt;
        &amp;lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(#0;.*#0;)" replacement=""/&amp;gt;
        &amp;lt;tokenizer class="solr.StandardTokenizerFactory" /&amp;gt;
        &amp;lt;filter class="org.apache.solr.analysis.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="1"
                splitOnCaseChange="1"
                splitOnNumerics="1"
                preserveOriginal="1"
                stemEnglishPossessive="1"/&amp;gt;
        &amp;lt;filter class="solr.ICUFoldingFilterFactory"/&amp;gt;
        &amp;lt;filter class="solr.EnglishPossessiveFilterFactory"/&amp;gt;
        &amp;lt;filter class="solr.KeywordRepeatFilterFactory" /&amp;gt;
        &amp;lt;filter class="solr.PorterStemFilterFactory"/&amp;gt;
      &amp;lt;/analyzer&amp;gt;
    &amp;lt;/fieldType&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For more information you can check the solr documentation wich is pretty good&amp;nbsp; at describing the different tokenizers :&amp;nbsp;&lt;A href="https://solr.apache.org/guide/6_6/about-tokenizers.html" target="_self"&gt;https://solr.apache.org/guide/6_6/about-tokenizers.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;You can also test your solr config in the analysis tab :&lt;A href="http://localhost:8983/solr/#/alfresco/analysis" target="_self"&gt;&amp;nbsp;http://localhost:8983/solr/#/alfresco/analysis&lt;/A&gt;&lt;BR /&gt;for this, make sur to use the proper type (text_en in this case).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 05 Dec 2024 15:13:50 GMT</pubDate>
    <dc:creator>BMonjanel</dc:creator>
    <dc:date>2024-12-05T15:13:50Z</dc:date>
    <item>
      <title>no highlight results for phrase search</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/no-highlight-results-for-phrase-search/m-p/119939#M32971</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm trying to return highlight snippets via the search api - which works perfectly fine when searching for words and combining them with AND or OR. But when I search a phrase, the api returns the correct document, but no highlight part in the result. Do you have any pointers to why that is?&lt;/P&gt;&lt;P&gt;Version is 7.2, this query for example works as expected:&lt;/P&gt;&lt;PRE&gt;{
     "query": {
         "language": "afts",
         "query": "cm:content:this AND cm:content:is AND cm:content:a AND cm:content:test"
     },
     "include": [
         "path"
     ],
     "paging": {
         "maxItems": 10,
         "skipCount": 0
     },
     "highlight": {
         "snippetCount": 3,
         "mergeContiguous": true,
         "fragmentSize": 300,
         "fields": [
             {
                 "field": "cm:content"
             }
         ]
     }
}&lt;/PRE&gt;&lt;P&gt;however, if i change it to&lt;/P&gt;&lt;PRE&gt;"=cm:content:\"this is a test\""&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;no highlights are returned - only the (correct) hit.&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;</description>
      <pubDate>Sat, 13 Jul 2024 18:59:34 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/no-highlight-results-for-phrase-search/m-p/119939#M32971</guid>
      <dc:creator>herbhub</dc:creator>
      <dc:date>2024-07-13T18:59:34Z</dc:date>
    </item>
    <item>
      <title>Re: no highlight results for phrase search</title>
      <link>https://connect.hyland.com/t5/alfresco-forum/no-highlight-results-for-phrase-search/m-p/485133#M39690</link>
      <description>&lt;P&gt;I'm clueless on that one too, but I've been working on it and here are my conclusions so far :&lt;BR /&gt;&lt;BR /&gt;I'm not sure how SearchServices uses the parameters of the highlighter (in solrconfig.xml or at query time) especially the hl.usePhraseHighlighter (true by default it can be overwritted at query time but don't seem to change anything) :&amp;nbsp;&lt;BR /&gt;&lt;A href="https://solr.apache.org/guide/6_6/highlighting.html" target="_self"&gt;https://solr.apache.org/guide/6_6/highlighting.html&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;In my opinion, it has something to do with&amp;nbsp;the way that solr is used by Alfresco Search Services to tokenize the content of your document it interprets it by splitting it into words and then do some filtering (like removing 's and link words).&lt;BR /&gt;&lt;BR /&gt;If you look into the schema.xml you'll find the configuration it uses :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"&amp;gt;
&amp;lt;analyzer type="index"&amp;gt;
...
&amp;lt;tokenizer class="solr.StandardTokenizerFactory"/&amp;gt;
&amp;lt;!-- in this example, we will only use synonyms at query time
&amp;lt;filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/&amp;gt;
--&amp;gt;
&amp;lt;filter class="solr.ICUFoldingFilterFactory"/&amp;gt;
&amp;lt;filter class="solr.EnglishPossessiveFilterFactory"/&amp;gt;
&amp;lt;filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/&amp;gt;
&amp;lt;!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
&amp;lt;filter class="solr.EnglishMinimalStemFilterFactory"/&amp;gt;
--&amp;gt;
&amp;lt;filter class="solr.PorterStemFilterFactory"/&amp;gt;
&amp;lt;filter class="solr.CommonGramsFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/&amp;gt;
&amp;lt;/analyzer&amp;gt;
&amp;lt;analyzer type="query"&amp;gt;
...
&amp;lt;/analyzer&amp;gt;
&amp;lt;/fieldType&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have noticed that the highlighted text has its own tokenisation/filter. Hence, I'm more and more suspecting a misconfiguration of either one of them :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;    &amp;lt;fieldType name="highlighted_text_en" class="solr.TextField"&amp;gt;
      &amp;lt;analyzer&amp;gt;
        &amp;lt;charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(#0;.*#0;)" replacement=""/&amp;gt;
        &amp;lt;tokenizer class="solr.StandardTokenizerFactory" /&amp;gt;
        &amp;lt;filter class="org.apache.solr.analysis.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="1"
                splitOnCaseChange="1"
                splitOnNumerics="1"
                preserveOriginal="1"
                stemEnglishPossessive="1"/&amp;gt;
        &amp;lt;filter class="solr.ICUFoldingFilterFactory"/&amp;gt;
        &amp;lt;filter class="solr.EnglishPossessiveFilterFactory"/&amp;gt;
        &amp;lt;filter class="solr.KeywordRepeatFilterFactory" /&amp;gt;
        &amp;lt;filter class="solr.PorterStemFilterFactory"/&amp;gt;
      &amp;lt;/analyzer&amp;gt;
    &amp;lt;/fieldType&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For more information you can check the solr documentation wich is pretty good&amp;nbsp; at describing the different tokenizers :&amp;nbsp;&lt;A href="https://solr.apache.org/guide/6_6/about-tokenizers.html" target="_self"&gt;https://solr.apache.org/guide/6_6/about-tokenizers.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;You can also test your solr config in the analysis tab :&lt;A href="http://localhost:8983/solr/#/alfresco/analysis" target="_self"&gt;&amp;nbsp;http://localhost:8983/solr/#/alfresco/analysis&lt;/A&gt;&lt;BR /&gt;for this, make sur to use the proper type (text_en in this case).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2024 15:13:50 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-forum/no-highlight-results-for-phrase-search/m-p/485133#M39690</guid>
      <dc:creator>BMonjanel</dc:creator>
      <dc:date>2024-12-05T15:13:50Z</dc:date>
    </item>
  </channel>
</rss>

