cancel
Showing results for 
Search instead for 
Did you mean: 

Wildcards in Lucene Searches

dbevacqua
Champ in-the-making
Champ in-the-making
Hi

This follows on from an earlier thread which wasn't resolved.

I'm having difficulty getting wildcard searches to work. For example:

@cm\:name:heal*

I've traced through the relevant code and the problem seems to be in org.alfresco.repo.search.impl.lucene.QueryParser, around line 720:


  final public Query Term(String field) throws ParseException {
  Token term, boost=null, fuzzySlop=null, goop1, goop2;
  boolean prefix = false;
  boolean wildcard = false;
  boolean fuzzy = false;
  boolean rangein = false;
  Query q;
    switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
    case TERM:
    case PREFIXTERM:
    case WILDTERM:
    case NUMBER:
      switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
      case TERM:
        term = jj_consume_token(TERM);
        break;
      case PREFIXTERM:
        term = jj_consume_token(PREFIXTERM);
                             prefix=true;
        break;
      case WILDTERM:
        term = jj_consume_token(WILDTERM);
                           wildcard=true;

in jj_ntk() when the token is 'heal*', the kind is (incorrectly) identified as 19 (PREFIXTERM), not WILDTERM.

Or am I missing something?

This is a real problem for us. I am prepared to admit it is something I am doing but I'd appreciate some guidance on the matter if anyone has any ideas.

Thanks

Dominic
9 REPLIES 9

dbevacqua
Champ in-the-making
Champ in-the-making
I should add that the result is that this line


         q = getFieldQuery(field, analyzer, term.image.substring(1, term.image.length()-1), s);


is not called when there is a wildcard, but is called when there isn't.

dbevacqua
Champ in-the-making
Champ in-the-making
ok please ignore the above, and if you are near enough and appropriately armed, shoot me.

Complete red herring - having just RTFM for PrefixQuery I see that it has correctly identified it. I should never really have doubted JavaCC…

Still no results returned though Smiley Sad

dbevacqua
Champ in-the-making
Champ in-the-making
Hi

Having looked through the source for LuceneTest and Lucene2Test, I can't see any tests of the wildcard searches. Could somebody point me at one please?

Thanks

Dominic

dbevacqua
Champ in-the-making
Champ in-the-making
ok finally got a wildcard search to work by using the full QName of the property:

(escaping removed for readability)

@{ourNamespace}name:health*
20 results

@{ourNamespace}name:health
19 results

@{ourNamespace}name:healthy
1 result

Hope this sheds some light and reassures you that I am not totally insane.

Any idea why the short version is only working when wildcards not given? Something peculiar about our namespace name or the local name?

Dominic

kevinr
Star Contributor
Star Contributor
Yes the full form uri must be used for a lucene search - sorry we should have spotted this earlier when you posted it. I'm not sure when the short form works if at all Smiley Happy I would suggest you stick with the long form - that's what we use when generating searches in the web-client.

For instance, this is the query I generate when the user wants to search for content with the name of "myfile.txt":

TYPE:"{http://www.alfresco.org/model/content/1.0}content" AND @\{http\://www.alfresco.org/model/content/1.0\}name:myfile.txt
I've not tried to use the short form myself.

Wildcarding is also working correctly. E.g. I generate this search when the user wants to find all files starting with the name "myfile":

TYPE:"{http://www.alfresco.org/model/content/1.0}content" AND @\{http\://www.alfresco.org/model/content/1.0\}name:myfile*

Hope this helps,

Kevin

dbevacqua
Champ in-the-making
Champ in-the-making
Hi Kevin

Thanks for your reply. The short form works when no wildcard is specified. I notice that the NamespaceResolver bein used only contains a mapping between one of the three prefixes in my content model and the namespace uri. There are a number of other null entries in the prefixes. DOn't know if this is of any help.

I should point out that examples on this wiki page:

http://wiki.alfresco.com/wiki/Search#Finding_nodes_by_text_property_values

use short names.

Anyway I shall use the full name from now on.

Thanks

Dominic

kevinr
Star Contributor
Star Contributor
Interesting that short names seem to be the problem - i've emailed our Lucene expert who wrote our integration and when he gets back from holiday hopefully he can shed some light on the issue.

Thanks,

Kevin

jtorres
Champ in-the-making
Champ in-the-making
Thanks a lot, It has been a great help. I was really mad trying to find what was wrong.

andy
Champ on-the-rise
Champ on-the-rise
Hi

The short names issue with wildcard queries (and fuzzy queries and prefix queries) is fixed in 2.0.

Regards

Andy