cancel
Showing results for 
Search instead for 
Did you mean: 

Search content without extension

mattjourdan
Champ in-the-making
Champ in-the-making

Hello,

I use alfresco community 5.0.d and i would like to know if it is possible to search in Alfresco all the files thatdon't have an extension.

I do not know how to do the search.

Thanks,

 

Matthieu

12 REPLIES 12

mehe
Elite Collaborator
Elite Collaborator

Depends on which Search you want to use. If using the "Aikau" Search in Share or the Alfresco FTS , the Searchstring !=cm:name:*.??? should do it. It should find all nodes not having a name that ends with a three character extension.

afaust
Legendary Innovator
Legendary Innovator

The question isn't necessarily a matter of which UI you use (Aikau faceted search or Node Browser for instances), but if the search services support this type of query. The problem with a wildcard based approach in FTS is that it will by design only scale to a certain amount of documents in the system. This is a result of how the query is translated to the underlying Lucene system in SOLR. Also, the pattern *.??? assumes that all extensions are three-letter extensions only which might have been the standard in the old DOS 8.3 world but all modern MS Office extensions are four-lettered ones.

Without having done a similar query myself on a large document base (i.e. more than just a couple tens of thousands of documents), I would assume the best way to work with this is by doing a CMIS query using the LIKE operator on cmis:name. The reasoning behind this is that a CMIS query using LIKE can actually be applied against the database instead of the SOLR index, and thus is not limited by the index query rewrite restrictions. The only thing you need to ensure is that the additional indexes for transactional metadata queries have been applied on the database system.

mehe
Elite Collaborator
Elite Collaborator

Hi Axel, I mentioned "Aikau" because it's the easiest way to test the FTS String. The query performs well on large document sets (tested with 1000.000 doc repo ) , but  paging throu large resultset gets slower for following pages (and gets worse page by page)

It's true it finds only three character extensions, but is easy to adapt 🙂

I used ??? because I thought Solr would internally invert the query string (???.*) which would not be so expensive - do you know if this is correct?

afaust
Legendary Innovator
Legendary Innovator

I can't say how SOLR / Lucene handles this low level. I just remember issues with running into maxBooleanClause limits with Alfresco SOLR before due to the way that Alfresco was rewriting wildcard queries before sending them off to the SOLR / Lucene layer. Though this may have changed in Alfresco 5.0 or later versions...