cancel
Showing results for 
Search instead for 
Did you mean: 

How to get a Site's content (folders and files) using Java API?

mbel
Star Contributor
Star Contributor

Hello,

Could someone tell me what are the steps for getting all Site's content (folders and files) programmatically using Java API?

I see that there is FileFolderService, however not sure what information I have to get from a Site in order to proceed and use it in the fileFolderService.

Is there some method which I can use to loop  through all folders and files?

Thank you in advance.

1 ACCEPTED ANSWER

afaust
Legendary Innovator
Legendary Innovator

It depends what you mean by "all content of a site" because a site can be sub-structure in components, e.g. a "documentlibrary", "wiki" and others. If you really mean "all content" than you can simply use the FileFolderService operations on the NodeRef of the site itself (obtained via the SiteService). Otherwise you would need to retrieve NodeRef of the relevant component first via SiteService.getContainer(shortName, containerName).

Any looping to retrieve all files and folders will have to be done in your custom code since all operations only act on a single hierarchy level for performance reasons. There is a deprecated listDeepFolders operation which is deprecated specifically for being dangerous in terms of system load / performance.

Note that by accessing ALL files and folders in a large site you can potentially overwhelm the transactional caches (if those files and folders have not been loaded into caches before), which will completely reset the corresponding shared caches and seriously impact performance of the system for all other uesrs. It typically is not recommended to perform bulk operations in a single transaction and instead use batch processing functionality to handle such use cases efficiently.

View answer in original post

7 REPLIES 7

kaynezhang
World-Class Innovator
World-Class Innovator

You can try to combine SiteService and FileFolderService/NodeService.
Use SiteService to get site information and get containers information for the site
And use FileFolderService/NodeService to get folders and files information.

afaust
Legendary Innovator
Legendary Innovator

It depends what you mean by "all content of a site" because a site can be sub-structure in components, e.g. a "documentlibrary", "wiki" and others. If you really mean "all content" than you can simply use the FileFolderService operations on the NodeRef of the site itself (obtained via the SiteService). Otherwise you would need to retrieve NodeRef of the relevant component first via SiteService.getContainer(shortName, containerName).

Any looping to retrieve all files and folders will have to be done in your custom code since all operations only act on a single hierarchy level for performance reasons. There is a deprecated listDeepFolders operation which is deprecated specifically for being dangerous in terms of system load / performance.

Note that by accessing ALL files and folders in a large site you can potentially overwhelm the transactional caches (if those files and folders have not been loaded into caches before), which will completely reset the corresponding shared caches and seriously impact performance of the system for all other uesrs. It typically is not recommended to perform bulk operations in a single transaction and instead use batch processing functionality to handle such use cases efficiently.

mbel
Star Contributor
Star Contributor

Thanks,

I know about this BatchProcessor and maybe I will use it after implement the custom logic for getting the whole files and folders. Will create separate discussion about it Smiley Happy

I managed to get list of containers -> this.siteService.listContainers which returns a Collection of FileInfo and after that loop all FileInfos and get its children - List<FileInfo> files = this.fileFolderService.list(file.getNodeRef());

In general if some file has reference to another file in another Site for example, which is the right way of getting this relationship , through this.nodeService.getChildAssocs(nodeRef)  or ?

douglascrp
World-Class Innovator
World-Class Innovator

In order to go through all the hierarchy, you can use something like this Dedunu Dhananjaya: Alfresco: Calculate folder size using Java based WebScript 

Now, to get the relationship information, you can use the getChildAssocs method, as you already found out.

You can check the relationship type with something like this:

if (nodeService.getType(childNodeRef).equals(ApplicationModel.TYPE_FILELINK)) {

}

mbel
Star Contributor
Star Contributor

‌ , Do the Alfresco Exporter and Importer have an implemented BatchProcessor?

afaust
Legendary Innovator
Legendary Innovator

The BulkFileSystemImporter uses batch processing internally. The default exporter / importer components are single threaded to ensure consistency of the operation result.

mbel
Star Contributor
Star Contributor

I want to have a Site's files hierarchy(which I've already got) + files content and when I have all that in json(for example) I can easily import the data in another alfresco system.

So, I have difficulty in getting the files content. I currently have no idea how to store it and after that how to write it.

What I tested is that if I use the reader = this.fileFolderService.getReader(file.getNodeRef());

I can use its method  reader.getContent(new File(file.getName())); which will copy the whole file data and creates the same file in the File path....

Do you have any suggestions how I can get and store the content of a file which I can easily write after that?

Thank you in advace.