<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lucene search with accented characters in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249873#M203003</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Where have you made the changes?&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Sounds like your changes are lost post deployment (probvably over-written in the tomcat expanded view??)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;It is best to add an extension to wire up the changes to avoid this.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Andy&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 11 Apr 2011 14:34:26 GMT</pubDate>
    <dc:creator>andy</dc:creator>
    <dc:date>2011-04-11T14:34:26Z</dc:date>
    <item>
      <title>Lucene search with accented characters</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249870#M203000</link>
      <description>I have a situation where I have two folders with almost the same name, except for the accent mark.Hernández-Monrreal, Juan GabrielHernandez-Monrreal, Juan GabrielI am trying to search for folders using the following query:+@cm\:name:"cm:Hernández-Monrreal_x002c__x0020_Juan_x0020_Gabriel" +TYPE:"cm:f</description>
      <pubDate>Wed, 06 Apr 2011 18:28:12 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249870#M203000</guid>
      <dc:creator>pchoe</dc:creator>
      <dc:date>2011-04-06T18:28:12Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene search with accented characters</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249871#M203001</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The default analysis is to strip accents.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The analysers are configurable.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;You would need to change the analyser for d:text and then reindex.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Change the setting in alfresco/model/dataTypeAnalyzers.properties to:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;d_dictionary.datatype.d_text.analyzer=your.analyzer.class&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;or, copy alfresco/model/dataTypeAnalyzers.properties and related files to a new location and changing the definition of the bean that loads this file – “dictionaryBootstrap” – currently defined in core-services-context.xml.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Andy&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 07 Apr 2011 18:55:07 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249871#M203001</guid>
      <dc:creator>andy</dc:creator>
      <dc:date>2011-04-07T18:55:07Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene search with accented characters</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249872#M203002</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I put the following custom code for the analyzer:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;import java.io.Reader;&lt;BR /&gt;import java.util.Set;&lt;BR /&gt;&lt;BR /&gt;import org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardFilter;&lt;BR /&gt;import org.apache.lucene.analysis.Analyzer;&lt;BR /&gt;import org.apache.lucene.analysis.ISOLatin1AccentFilter;&lt;BR /&gt;import org.apache.lucene.analysis.LowerCaseFilter;&lt;BR /&gt;import org.apache.lucene.analysis.StopAnalyzer;&lt;BR /&gt;import org.apache.lucene.analysis.StopFilter;&lt;BR /&gt;import org.apache.lucene.analysis.TokenStream;&lt;BR /&gt;import org.apache.lucene.analysis.standard.StandardFilter;&lt;BR /&gt;import org.apache.lucene.analysis.standard.StandardTokenizer;&lt;BR /&gt;&lt;BR /&gt;/**&lt;BR /&gt; * Custom Lucene analyzer that doesn't implement ISOLatin1AccentFilter.&lt;BR /&gt; * &lt;BR /&gt; * @author pchoe&lt;BR /&gt; *&lt;BR /&gt; */&lt;BR /&gt;public class MSICustomStrictAnalyzer extends Analyzer {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; private Set stopSet;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static final String STOP_WORDS[];&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; static &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; STOP_WORDS = StopAnalyzer.ENGLISH_STOP_WORDS;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public MSICustomStrictAnalyzer()&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this(STOP_WORDS);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public MSICustomStrictAnalyzer(String stopWords[])&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; stopSet = StopFilter.makeStopSet(stopWords);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; /**&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * @see org.apache.lucene.analysis.Analyzer#tokenStream(java.lang.String, java.io.Reader)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; */&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; public TokenStream tokenStream(String fieldName, Reader reader)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; TokenStream result = new StandardTokenizer(reader);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; result = new StandardFilter(result);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; result = new LowerCaseFilter(result);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; result = new StopFilter(result, stopSet);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; //result = new ISOLatin1AccentFilter(result);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return result;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;and modified the dataTypeAnalyzers.properties to &lt;/SPAN&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;# Data Type Index Analyzers&lt;BR /&gt;&lt;BR /&gt;d_dictionary.datatype.d_any.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;#d_dictionary.datatype.d_text.analyzer=org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser&lt;BR /&gt;#d_dictionary.datatype.d_content.analyzer=org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser&lt;BR /&gt;d_dictionary.datatype.d_int.analyzer=org.alfresco.repo.search.impl.lucene.analysis.IntegerAnalyser&lt;BR /&gt;d_dictionary.datatype.d_long.analyzer=org.alfresco.repo.search.impl.lucene.analysis.LongAnalyser&lt;BR /&gt;d_dictionary.datatype.d_float.analyzer=org.alfresco.repo.search.impl.lucene.analysis.FloatAnalyser&lt;BR /&gt;d_dictionary.datatype.d_double.analyzer=org.alfresco.repo.search.impl.lucene.analysis.DoubleAnalyser&lt;BR /&gt;d_dictionary.datatype.d_date.analyzer=org.alfresco.repo.search.impl.lucene.analysis.DateAnalyser&lt;BR /&gt;d_dictionary.datatype.d_datetime.analyzer=org.alfresco.repo.search.impl.lucene.analysis.DateTimeAnalyser&lt;BR /&gt;d_dictionary.datatype.d_boolean.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;d_dictionary.datatype.d_qname.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;d_dictionary.datatype.d_guid.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;d_dictionary.datatype.d_category.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;d_dictionary.datatype.d_noderef.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;d_dictionary.datatype.d_path.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;d_dictionary.datatype.d_locale.analyzer=org.alfresco.repo.search.impl.lucene.analysis.LowerCaseVerbatimAnalyser&lt;BR /&gt;d_dictionary.datatype.d_text.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;d_dictionary.datatype.d_content.analyzer=com.microstrat.alfresco.lucene.MSICustomStrictAnalyzer&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;so that the custom analyzer would work.&amp;nbsp; When I run it in the debug mode from eclipse, I see that it will go into the custom analyzer.&amp;nbsp; But when I do a lucene search even after reindexing, I still get the result as from the default analyzer.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Am I missing a configuration somewhere?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 11 Apr 2011 14:01:39 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249872#M203002</guid>
      <dc:creator>pchoe</dc:creator>
      <dc:date>2011-04-11T14:01:39Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene search with accented characters</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249873#M203003</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Where have you made the changes?&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Sounds like your changes are lost post deployment (probvably over-written in the tomcat expanded view??)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;It is best to add an extension to wire up the changes to avoid this.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Andy&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 11 Apr 2011 14:34:26 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-search-with-accented-characters/m-p/249873#M203003</guid>
      <dc:creator>andy</dc:creator>
      <dc:date>2011-04-11T14:34:26Z</dc:date>
    </item>
  </channel>
</rss>

