<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lucene/SOLR Stemming Analyser in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290505#M243635</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Adding a filter to alfrescoDataType file in schema.xml will have no effect on d_content data type,alfresco 4.x did not use analyer configured in schema.xml for d_content data type. &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It will use data type Index Analyzers configured in &lt;/SPAN&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;${SOLR_CONFIG_ROOT}/workspace-SpacesStore/alfrescoResources/alfresco/model/dataTypeAnalyzers__{your locale}.properties.&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt; if there is no properties for you locale ,it will use the default AlfrescoStandardAnalyser.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 07 Aug 2014 04:35:00 GMT</pubDate>
    <dc:creator>kaynezhang</dc:creator>
    <dc:date>2014-08-07T04:35:00Z</dc:date>
    <item>
      <title>Lucene/SOLR Stemming Analyser</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290504#M243634</link>
      <description>I am using the porter stemming analyser for d_content but noticed that stop words are not removed from the index for new documents I add.&amp;nbsp; I added the stopword filter to schema.xml underneath the lowercase filter but the stop words still exist in the index.&amp;nbsp; Is this the correct approach Is there ano</description>
      <pubDate>Tue, 29 Jul 2014 22:31:39 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290504#M243634</guid>
      <dc:creator>stevegreenbaum</dc:creator>
      <dc:date>2014-07-29T22:31:39Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene/SOLR Stemming Analyser</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290505#M243635</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Adding a filter to alfrescoDataType file in schema.xml will have no effect on d_content data type,alfresco 4.x did not use analyer configured in schema.xml for d_content data type. &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It will use data type Index Analyzers configured in &lt;/SPAN&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;${SOLR_CONFIG_ROOT}/workspace-SpacesStore/alfrescoResources/alfresco/model/dataTypeAnalyzers__{your locale}.properties.&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt; if there is no properties for you locale ,it will use the default AlfrescoStandardAnalyser.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 07 Aug 2014 04:35:00 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290505#M243635</guid>
      <dc:creator>kaynezhang</dc:creator>
      <dc:date>2014-08-07T04:35:00Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene/SOLR Stemming Analyser</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290506#M243636</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Are you saying that schema.xml is not used at all for setting analysers or are you saying it just doesn't have an effective on d:content?&amp;nbsp; So for which property types is the whitespace analyser that is declared in schema.xml used by Alfresco?&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Is there a way to apply a filter to the analyser which is set in dataTypeAnalyzers__{your locale} so I can add the stopword filter to the portersnowball analyser?&amp;nbsp; &lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 12 Aug 2014 19:57:01 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290506#M243636</guid>
      <dc:creator>stevegreenbaum</dc:creator>
      <dc:date>2014-08-12T19:57:01Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene/SOLR Stemming Analyser</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290507#M243637</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;1.In alfresco 4.x all fields are dynamic field,and all filed type are alfrescoDataType( you can see it in schema.xml file).&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;2.AlfrescoDataType will use SolrLuceneAnalyser as index analyser.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;3.SolrLuceneAnalyser is an wrapper analyzer ,it will analyse properties according to the property definition.for example:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;a)for some regular fields(FIELD_ID、FIELD_DBID） ,it will use fix analyzer (LongAnalyser、VerbatimAnalyser）;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;b)and for&amp;nbsp; d:content/d:text/d:mltext it will use MLAnalayser;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;4.MLAnalayser will load analyzer according locale which is configured in &lt;/SPAN&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&amp;nbsp; ${SOLR_CONFIG_ROOT}/workspace-SpacesStore/alfrescoResources/alfresco/model/&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt;,for exmaple for english locale it will use analyzers configured in dataTypeAnalyzers_en.properties file &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;d_dictionary.datatype.d_text.analyzer=org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;d_dictionary.datatype.d_content.analyzer=org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;That is for d:text and d:cotent property org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser is used&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;If you want to use portersnowball analyser and apply a filter to it ,you can implement a new index analyzer that extends SnowballAnalyzer, in your analyzer override tokenStream method and call your StopFilter. And then configure your custom analyser in &lt;/SPAN&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;${SOLR_CONFIG_ROOT}/workspace-SpacesStore/alfrescoResources/alfresco/model/dataTypeAnalyzers__{your locale}.properties&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 13 Aug 2014 02:38:44 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290507#M243637</guid>
      <dc:creator>kaynezhang</dc:creator>
      <dc:date>2014-08-13T02:38:44Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene/SOLR Stemming Analyser</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290508#M243638</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Thank you!&amp;nbsp; I appreciate your thoughtful response.&amp;nbsp; Steve&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I wanted to share a response I received via Alfresco support this morning.&amp;nbsp; They are indicating that even if the stopwords are removed, a search using those stopwords will still find the document because a non-tokenized version is also stored (I assume even if you don't specify "Both" in the model for tokenization).&amp;nbsp; The AlfrescoStandardAnalyser is supposed to remove stopwords, but I was finding that I could still search on them.&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Here is their response "Alfresco implementaion of Solr every document is indexed on 2 different ways (so it uses twice space than lucene), one using the locale and its defined analyzer (and stopwords) and another raw idexation is done for the "crosslanguage" (multilingual). They are found in your case on the cross language search.&amp;nbsp; As stop words are not the same across languages we leave all words in. as result … you cannot "not find" stopwords when you specifically search for them. "&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 13 Aug 2014 15:52:32 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290508#M243638</guid>
      <dc:creator>stevegreenbaum</dc:creator>
      <dc:date>2014-08-13T15:52:32Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene/SOLR Stemming Analyser</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290509#M243639</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I can't agree all .&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;I guess for "d:content/d:text/d:mltext" type, cross language search should be implemented like this:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;For every definite locale , locale analyzer defined in dataTypeAnalyzers_**.properties will be used.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;For the crosslanguage one,there also should be an analyzer to be used ,either it is&amp;nbsp; server default locale's analyzer or just AlfrescoStandardAnalyser.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;So either way ,we can implements your requirements by customizing your locale's analyzer and AlfrescoStandardAnalyser.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;This is just my guess,I have not tested it,correct me if I'm wrong.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 14 Aug 2014 01:58:00 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-solr-stemming-analyser/m-p/290509#M243639</guid>
      <dc:creator>kaynezhang</dc:creator>
      <dc:date>2014-08-14T01:58:00Z</dc:date>
    </item>
  </channel>
</rss>

