cancel
Showing results for 
Search instead for 
Did you mean: 

Folder structure in Alfresco

naval
Champ in-the-making
Champ in-the-making
Hi,

I am planing to maintain document in the alfresco repository. Is there any dependency of alfresco performance based on the folder structure we maintained in the repository.

For example,
Rather than keeping all documents in one folder, is it better to keep in different folders when we speak about alfresco performance.
(Activities can be done like Document search, document upload, document custom actions, some validations on documents etc.)


Thanks,
Naval
6 REPLIES 6

mitpatoliya
Star Collaborator
Star Collaborator
Many folders with minimum documents in each folder will give you better performance then the other option.
Also you can explore categories feature in alfresco so that you can use combination of both or only one of that based on your requirement.

naval
Champ in-the-making
Champ in-the-making
Thanks a lot mitpatoliya for the info..
It would be really great if you can provide me some technical reference/logical explanation for this information.

Regards,
Naval

mrogers
Star Contributor
Star Contributor
The folder structure is fairly irrelevant for performance.  For example if you are searching for a document by its content there is no dependecy upon the path.

It's only where you use an interface that relies upon listing large numbers of documents that there are issues with large numbers of documents. If you don't have requirements to list large numbers of files then there will be next to no impact with a big folder.   If you are using an interface that displays lists of files in a folder then keep the numbers down.

I'd start by trying to lay out your repo in a way that makes best sense for your documents.

Well Agree with mrogers comments also.
But most of the time we need to access the alfresco repo though alfresco explorer and when it takes more time in loading  if we have lots of documents under that folder and client always consider it as performance degradation.
Actually there are various limitation like lucene max result count, JVM space etc.. will come in to play when your documents crosses more then some thousands under one space.

You can also check these posts to get more idea on this
http://forums.alfresco.com/forum/developer-discussions/repository-services/lucene-indexing-and-perfo...
http://forums.alfresco.com/forum/developer-discussions/technical-architecture-discussion/performance...

steves
Champ in-the-making
Champ in-the-making
I hope i'm not hijacking this thread. If so please notice me and i'll open another one.
But I think my observations are related to the discussion (and any advice would be appreciated!)

In my company, we were using another cms to store our invoices.
I recently migrated these documents (169k of pdf files) without any folder structure into a 4.2.d alfresco community repo.
Why no folder structure: because I have developed an html/javascript ui for the users, which works fine.

But, and this is where this may be relevant, when using share UI document library it takes too long for alfresco to retrieve the documents list (at least with my setup). nearly 7 minutes are needed by the doclist repo webscript to return the first page of results.

Using tracing you can see that the FileFolderService gets all of the documents then trims down for pagination. (see trace below).

I assumed that the query would only work on a small subset of my entire documents but unless i'm wrong it's not the case.

If there are experts out there, is this correct?

If we ever want to use the share UI to navigate through these docs should there be a folder structure put in place?

<code>
14:46:18,193 DEBUG [org.alfresco.repo.jscript.ScriptLogger] doclist.lib.js - NodeRef: alfresco://company/home Query: +PATH:"/app:company_home/cm:Factures/*" -TYPE:"cm:systemfolder" -TYPE:"fm:forums" -TYPE:"fm:forum" -TYPE:"fm:topic" -TYPE:"fmSmiley Tongueost"
14:46:18,218 DEBUG [org.alfresco.repo.jscript.ScriptLogger] doclist.lib.js: requestTotalCountMax=1000
14:46:18,229 DEBUG [org.alfresco.repo.jscript.ScriptLogger] doclist.lib.js: starting query @Wed Oct 16 2013 14:46:18 GMT+0200 (CEST)
14:49:01,613 WARN  [org.alfresco.repo.cache.TransactionalCache.org.alfresco.cache.node.nodesTransactionalCache] Transactional update cache 'org.alfresco.cache.node.nodesTransactionalCache' is full (125000).
14:49:07,428 WARN  [org.alfresco.repo.cache.TransactionalCache.org.alfresco.cache.node.aspectsTransactionalCache] Transactional update cache 'org.alfresco.cache.node.aspectsTransactionalCache' is full (65000).
14:53:13,051 DEBUG [org.alfresco.repo.model.filefolder.GetChildrenCannedQuery] Base query (sort=y, perms=n): 169118 in 414800 msecs
14:53:13,367 DEBUG [org.alfresco.repo.model.filefolder.GetChildrenCannedQuery] Post-query perms: 1000 in 315 msecs
14:53:13,369 DEBUG [org.alfresco.repo.model.filefolder.FileFolderServiceImpl] List: 50 items in 415126 msecs [pageNum=1,skip=0,max=50,hasMorePages=true,totalCount=(1000, 1000),parentNodeRef=workspace://SpacesStore/433c7c0b-e9a3-4b48-a189-a5f9c44e4df1]
14:53:14,970 DEBUG [org.alfresco.repo.jscript.ScriptLogger] doclist.lib.js: end query @Wed Oct 16 2013 14:53:14 GMT+0200 (CEST)
14:53:14,971 DEBUG [org.alfresco.repo.jscript.ScriptLogger] doclist.lib.js - query results: 50
14:53:14,986 DEBUG [org.alfresco.repo.jscript.ScriptLogger] doclist.lib.js - totalRecords: 1000
<code>