cancel
Showing results for 
Search instead for 
Did you mean: 

Searching with polish stemmer

ddr
Champ in-the-making
Champ in-the-making
Hello,
I'm trying to make my Alfresco to search in Polish.
I'm using Alfresco 5.0c + Solr4. My stemmer is Morfologik. There is no dataTypeAnalyzers__{your locale}.properties in this version of Alfresco.
I added filter to schema.xml file and I can use Morfologik in Solr, but Alfresco isn't indexing and searching using new stemmer.
What is the way to add new stemmer?
5 REPLIES 5

gravitonian
Star Collaborator
Star Collaborator
Hi,

Can you please show me how you added the polish filters to the schema.xml?

Also, note that it will only work for new content added after you updated the schema and restarted.

ddr
Champ in-the-making
Champ in-the-making
Hi,
I uploaded my schema.xml, could you check it?
I added Morfologik filter to oldStandardAnalysis and text___, also try to change dynamicFields to use text___.
I know that this will work only with new content, so I'm adding new file from Share or use Bulk Filesystem Import Tool and then I'm testing.

gravitonian
Star Collaborator
Star Collaborator
Hi,

Is there the possibility that you have to set dictionary as follows:


<filter class="solr.MorfologikFilterFactory" dictionary="MORFOLOGIK" />


ddr
Champ in-the-making
Champ in-the-making
I have already tried that and it isn't working. I found in solr logs the dictionary is no longer needed. Any ideas?
Adding new filter to schema.xml should be enough for alfresco 5 to use it?

gravitonian
Star Collaborator
Star Collaborator
Hi,

I think you should use the following field type for your fields.

<fieldType name="alfrescoFieldType" class="org.alfresco.solr.AlfrescoFieldType" />

It is working dynamically looking up the specific language field type (e.g. text_en, text_ru, text_pl).

Define a new polish field type looking something like this:

<!– Polish –>
      <fieldType name="text_pl" class="solr.TextField" positionIncrementGap="100">
        <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pl.txt" />
          <filter class="solr.StempelPolishStemFilterFactory" language="Polish" />   
        </analyzer>
      </fieldType>

Don't change anything else, if you change the field type for the fields to something else than AlfrescoFieldType you are effectivley turning off the dynamic language type lookup.
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.