11-08-2021 10:51 AM
Hello,
sorry, if the question is too basic, but I searched for hours for an answer.
I don't understand the meaning in the docs for the exact phrase search:
"The whole phrase will be tokenized"
Thanks for explaining the "tokenized". I am looking forward to understand the difference to the exact term search, which is not clear for me:
https://docs.alfresco.com/search-services/latest/using/#search-for-an-exact-term
Thanks for any help,
Thorsten
Phrases are enclosed in double quotes. Any embedded quotes can be escaped using ``. If no field is specified then the default TEXT field will be used, as with searches for a single term.
The whole phrase will be tokenized before the search according to the appropriate data dictionary definition(s).
11-09-2021 12:26 AM
SOLR is using tokenization when searching: https://solr.apache.org/guide/6_6/tokenizers.html
That means that searching term is not what you are typing, but some meaningful parts of the sentence.
When searching for "Running is a sport", the real query is expanded to "run, run_is, is, is_a, a, a_sport, sport". So you are getting all the results including that tokens.
However, when using ="Running is a sport", the query returns the fields that include exactly that terms in the order specified "Running, is, a, sport".
11-09-2021 05:59 AM
Thank you very much for clarification of "tokenization"!
@angelborroy wrote:When searching for "Running is a sport", the real query is expanded to "run, run_is, is, is_a, a, a_sport, sport".
I did not find in the solr6 tokenization doc, that "is_a" or "a_sport" has also to be seen as a token. I expected that only different words are tokens, but not all two word combinations behind each other. (Just to be sure: The underscore of your example does mean a single space, doesn't it?)
@angelborroy wrote:So you are getting all the results including that tokens.
Does this mean, that every token you mentioned has to appear in every result document? But the order of the found tokens is not necessary? Therefore also documents are found with the following content: 'Is sport a running game'. No documents are found with this content: "Is this game a sport". Is this correct?
BTW If this is true, I don't understand why this search is called "phrase" search. Normally a phrase search implicits a certain order. It's more like a "set search"...
@angelborroy wrote:However, when using ="Running is a sport", the query returns the fields that include exactly that terms in the order specified "Running, is, a, sport".
I am glad that I interpreted this syntax correctly. Is it possible to use it as a JSON query without problems? I could not integrate the equal sign immediately into the following syntax:
{ "query": \{ "query":"cm:content:('*Running is a sport*')" } }
IMO the equal sign does not harmonize with cm:content. But perhaps I should omit cm:content and replace it with TEXT?
Thorsten
11-09-2021 06:17 AM
When using "=" with content (TEXT) fields, not the whole field value is considered. It will also fetch the content that includes that sentence.
11-11-2021 10:25 AM
@angelborroy wrote:When using "=" with content (TEXT) fields, not the whole field value is considered. It will also fetch the content that includes that sentence.
I am not sure if I understand you. Do you refer to my wildcards in the example above?
Regarding the field type TEXT: Is the following definition of TEXT correct?
TEXT virtual field (Because the link refers to Alfresco Search Enterprise. I did not find any other doc.)
BTW The syntax for an exact term search with JSON is clear now. The following works:
{ "query": { "query":"=cm:content:'Runnnig is a sport'" } }
Thanks,
Thorsten
Explore our Alfresco products with the links below. Use labels to filter content by product module.