safely crawl all documents via webscript

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2016 05:44 AM
what i am trying to do is:
find all nodes in the repo and get their file size. also get all versions of the node and calculate the overall filesize of the node and its versions.
how can i safely crawl every document in the repository?
searchservice is going to hit the result-limit easily, even if i increase the limit, the searches wont return results.
by traversing recursively through the repository i also seem to fill up the solr caches
i understand it is an antipattern to grab everything at once, but i don't know of any service/api that allows me to page the results into batches/pages,
please enlighten me
version: 5.0.c
find all nodes in the repo and get their file size. also get all versions of the node and calculate the overall filesize of the node and its versions.
how can i safely crawl every document in the repository?
searchservice is going to hit the result-limit easily, even if i increase the limit, the searches wont return results.
by traversing recursively through the repository i also seem to fill up the solr caches
private static void traverse(List<FileInfo> context) { for (FileInfo node : context) { if (node.isFolder()) { traverse(fileFolderService.list(node.getNodeRef())); } else { // is file = do stuff } }}
… :44,186 INFO [solr.component.AsyncBuildSuggestComponent] [Suggestor-alfresco-1] Loaded suggester shingleBasedSuggestions, took 267411 ms… :53,005 WARN [cache.node.nodesTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.node.nodesTransactionalCache' is full (125000).… :17,075 WARN [cache.node.aspectsTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.node.aspectsTransactionalCache' is full (65000).… :17,081 WARN [cache.node.propertiesTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.node.propertiesTransactionalCache' is full (65000).… :19,938 WARN [alfresco.cache.contentUrlTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.contentUrlTransactionalCache' is full (65000).… :19,991 WARN [alfresco.cache.contentDataTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.contentDataTransactionalCache' is full (65000).… :49,599 WARN [org.alfresco.nodeOwnerTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.nodeOwnerTransactionalCache' is full (40000).… :27,516 WARN [cache.node.childByNameTransactionalCache] [http-apr-8080-exec-1] Transactional update cache 'org.alfresco.cache.node.childByNameTransactionalCache' is full (65000).
i understand it is an antipattern to grab everything at once, but i don't know of any service/api that allows me to page the results into batches/pages,
please enlighten me

version: 5.0.c
Labels:
- Labels:
-
Archive
2 REPLIES 2
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2016 06:38 AM
It's not the solr cache that is filling up, its the transaction cache.
The node service does give you all results, so apart from being slow that is O.K.
The problem is that your code is executing in a single transaction and at some point there will likely be a limit on the number of database rows that can be updated
What the alfresco code does itself in those situations is to use the batch processor to break up your huge transaction into smaller chunks. That's probably what you want to do here.
The node service does give you all results, so apart from being slow that is O.K.
The problem is that your code is executing in a single transaction and at some point there will likely be a limit on the number of database rows that can be updated
What the alfresco code does itself in those situations is to use the batch processor to break up your huge transaction into smaller chunks. That's probably what you want to do here.

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2016 08:33 AM
Is there an easy example that i could re-use? if found user-rename tool but i am even more riddled now
