cancel
Showing results for 
Search instead for 
Did you mean: 

Strange results in CONTAINS() CMIS search with wildcards

juanh
Champ on-the-rise
Champ on-the-rise

Hi, everyone

I have detected a strange behaviour in CMIS queries using CONTAINS() and wildcards.

For example, I have a folder named "Someco" and I want to find it by part of its name using wildcards:

SELECT * from cmis:folder WHERE CONTAINS('cmis:name:"*Some*"')

This query find the the folder as is expected, but if I add a 'o' or an 'a' to the query as:

SELECT * from cmis:folder WHERE CONTAINS('cmis:name:"*Someo*"')

The modified query find my folder too, wich is incorrect as my folder does not contains "Someo" in its name.

¿Is there any way to correct this behaviour with the syntaxis of the query?

Thanks.

3 REPLIES 3

andy1
Star Collaborator
Star Collaborator

Hi

I have tried to reproduce this an failed. Are you sure it is finding the same folder!

Andy

juanh
Champ on-the-rise
Champ on-the-rise

Hi, Andy

Yes, I'm sure. It's an issue related to FTS word stemming but I don't know if I can deactivate it in some way.

Thanks

andy1
Star Collaborator
Star Collaborator

Hi

It seems you are falling foul of localised stemming. The name of a document can be treated as an identifier.

SELECT * from cmis:folder WHERE CONTAINS('=cmis:name:"*Some*"')

Or you could just use LIKE

SELECT * from cmis:folder WHERE cmis:name LIKE '%Some%'

Name is indexed in three ways

  • Localised with stemming
  • Split on white space and then into token parts (using WordDelimiter factory)
  • As a single token (an identifier)

The first two options are used together. You can not split on white space and do a wildcard match on the tokens. You will always get recall from the locale bit (the first way).

It has been suggested before that we support better control here, and it is on the list.

From your example, I think you should be OK with LIKE or =

Andy