cancel
Showing results for 
Search instead for 
Did you mean: 

Solr search results depend on Accept-Language http header

mando_alvarado_
Champ in-the-making
Champ in-the-making
I have a webscript which searches for nodes that have a certain string stored in a certain property. This is a name property, and by name, i mean people names. The end users of this application will be mostly Spanish speaking, and the names stored in these properties will also mostly be Spanish names.

Now the interesting part. I run the webscript (which returns an ajax response) and tell it to search for nodes that have "Mario" in that property. On my machine, everything works fine, and only "Mario"s show up. On my co-worker's machine however, he gets "Maria"s as well as "Mario"s.

I checked his request headers and compared them to mine. Every header was practically the same, except for the Accept-Language header. Mine was sent with the value of "en-US,en;q=0.8". I don't remember his exactly, but it started with "es-HN".

Well, this was no surprise. My OS and browser are in english, and his are in spanish. So I modified that request header and used his value, and sure enough, I got the same response from the webscript as he did.

I would like to know how to "fix" this. I realize this is probably a feature, but I would like to know how to disable it. If I search for "Mario" i would like to find nodes who contain this word, exactly as it is, regardless of the user's browser's language settings. It may be surrounded by other characters and what not, but the whole word, no less, MUST be there.

I searched online for answers but have come up with very little. The word "stemming" came up a few times. I also came across people who mentioned a StandardAnalyzer and a SpanishSnowballAnalyzer. I have only a vague idea of what these are but no clue how to use them.

Any help will be greatly appreciated.
3 REPLIES 3

zladuric
Champ on-the-rise
Champ on-the-rise
I had a similar issue like this once. What is this property defined like? In the data model?

I had a property with type
d:mltext
(as in multilingual) text, and so when my users entered stuff in one locale, it would only be shown to other users with the same locale. For others (ie. English, like in your case), this was not seen. It was a custom description-like field.

I ended up adding a new property, then I created a script that found ALL instances of this type with anything written in this property and then I copied everything to this new property. When I found stuff in both languages, I just added both texts. Now, when it was done, I configured Share to show the new prop instead, and later just removed the old one.

There is stuff around Wiki and (old) forums, maybe you can do some googling. Here's a starting point:
http://wiki.alfresco.com/wiki/Multilingual_Document_Support

andy
Champ on-the-rise
Champ on-the-rise
Hi

You can have fixed tokenisation across all languages (and not have language dependent stemming).
Remove all the <config>/alfresco/model/dataTypeAnalyzers_??.properties files and leave only the base property file.
Then configure this how you want.
The localised analysis for indexing and query using the lucene sub-system is picked up from these files.

For SOLR see <SOLR_HOME>\<CoreName>\alfrescoResources\alfresco\model

Andy

playerro
Champ on-the-rise
Champ on-the-rise

Same issue. Is there any ways to disable this behaviour?