cancel
Showing results for 
Search instead for 
Did you mean: 

Site search syntax for finding files by wildcard pattern

almoehi
Champ on-the-rise
Champ on-the-rise
Hi everybody,

I'd like to use the site search dashlet to find certain files in a doc lib sub folder. The are of type pdf and their names follow a defined scheme with variables. How is the syntax for that? Let's say, for example, I have a folder "Reports" in the doc lib, and in this folder are, amongst others, some files named "WP-<wpnum>-rep-<repnum>-<grpname>.pdf", where <wpnum> and <repnum> can have 1 or more digits and <grpname> is the short name of the group that created this file.


/
  Reports/
    WP-1-rep-1-grp1.pdf
    WP-1-rep-1-grp2.pdf
    WP-1-rep-2-grp1.pdf
    …
    <some unrelated files>
    …


So far, I have found out how to list all files in "Reports": +PATH:"//cm:Reports//*". But even a simple "*.pdf" does not work. Ok, I understand that this is XPath, so I also tried something like *[@cm:name='*pdf'] and variants, but nothing. It's obvious that I don't have experience with XPath, but am I really supposed to for finding some files?
Any help would be appreciated.

I have a general remark on the site search, maybe a developer reads it. Of course it is (or has the potential to be) quite powerful by being so flexible. But it's IMHO too complicated for "normal" users. I have to manage an Alfresco server for a medium project group. None of these people is a programmer and few of them know XPath, Lucene or even regular expressions. All they want is searching the document lib for files, using simple patterns they know from windows explorer. I can hardly convince them of learning a new language to do a simple search. And for me it's not convenient, too. Although I'm willing to dive into the internals of Alfresco, I neither have time to nor is the documentation clear enough. I've spent hours looking around for an example, but the Alfresco manual seems to concentrate on more complicated stuff.
I'd really love to have something like a "simple doc lib search", where I can use primitive wildcard patterns or at least regular expressions.

Best regards,
Alex
3 REPLIES 3

almoehi
Champ on-the-rise
Champ on-the-rise
I've managed to find the files using the following search pattern:
+PATH:"//cm:Reports//*" AND cm:name:WP*rep*.pdf


But as soon as I add hyphens it doesn't work anymore: WP-*-rep-*.pdf
Ok, that's not so important, but why is that and how can I include them into the pattern? Escaping using backslash or _x002d_ didn't work.

andy
Champ on-the-rise
Champ on-the-rise
Hi

You can escape "-" using \ in query expressions.
You can also quote the expression - you are allowed to use - and * inside phrases.
Also you can prefix cm:name with = to enforce pattern matching and not token matching.

What version of Alfresco are we talking about?
*.pdf
.pdf
pdf

….should all find the docs

The lucene tokeniser does some odd stuff with 12-3-04 as it thinks it is a date.
You may be falling for this. Infact it does this for any odd number temr split by -: etc ….
If you are using lucene you probably want to consider moving to SOLR or changing the default analysis.


You can also decide to use a separator that does not mean anything in the query language.
You should also consider putting metadata on your nodes rather then encoding this in the name.
It will also much easier and more efficient to query. (you can configure the search to use these fields)

As you have to escape punctuation as the start the easiest thing is to miss it out.

Andy

almoehi
Champ on-the-rise
Champ on-the-rise
Hi Andy,

thanks for your answer.

<blockquote>What version of Alfresco are we talking about?</blockquote>
I'm using Alfresco Community 4.2.c (I thought it would be shown somewhere as I've selected it from the dropdown box on creating the topic).

<blockquote>You can escape "-" using \ in query expressions. …</blockquote>
Escaping the hyphens does not work, nor does quoting, but the = prefix did the trick.

<blockquote>You can also decide to use a separator that does not mean anything in the query language. …</blockquote>
Unfortunately I have to stick to the naming scheme as it is defined by the project guidelines. Meta data gets lost when the file is passed around. And yes, "*.pdf" or simply "pdf" finds all pdfs in the directory, but I really need to look for the mentioned pattern (there are other pdfs in the directory and sometimes I need to find a subset, for example the second report of all groups).

However, you helped me a lot with that, thanks again. Maybe I should spend some time learning lucene basics. The problem is that our project members definitely won't be willing to do so. They keep complaining to me about the search function. Like me they don't ever need the full power of the engine, they only need it for finding file names in the document lib and thus expect it to work like windows file explorer. It would be really nice to have some simple document-lib-only search in future releases that provides exactly that.

Alex
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.