We have about 1 million files in our Alfresco that have not been <strong>accessed</strong> (aka viewed in share/explorer) in over a year. We want to remove these files. Even more we want to implement a age-off policy that removes files automatically when they haven't been accessed in a year.I think the best way to do this would be with a Scheduled Action. I have two ideas for how to do this.————————————————-Approach #1————————————————-I have the scheduled action running, but I don't know how to query for what I want. Here are my two files:scheduled-action-services-context.xml (More or less copied from somewhere else on the internet…)<blockcode><?xml version='1.0' encoding='UTF-8'?><!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'><beans> <!– Define the model factory used to generate object models suitable for use with freemarker templates. –> <bean id="templateActionModelFactory" class="org.alfresco.repo.action.scheduled.FreeMarkerWithLuceneExtensionsModelFactory"> <property name="serviceRegistry"> <ref bean="ServiceRegistry"/> </property> </bean><!– Execute the script /Company Home/Record Management/ageoff.js –> <bean id="runScriptAction" class="org.alfresco.repo.action.scheduled.SimpleTemplateActionDefinition"> <property name="actionName"> <value>script</value> </property> <property name="parameterTemplates"> <map> <entry> <key> <value>script-ref</value> </key> <!– Note that as of Alfresco 4.0, due to a Spring upgrade, the FreeMarker ${foo} entries must be escaped –> <value>\$\{selectSingleNode('workspace://SpacesStore', 'lucene', 'PATH:"/app:company_home/app:dictionary/app:scripts/cm:ageoff.js"' )\}</value> </entry> </map> </property> <property name="templateActionModelFactory"> <ref bean="templateActionModelFactory"/> </property> <property name="dictionaryService"> <ref bean="DictionaryService"/> </property> <property name="actionService"> <ref bean="ActionService"/> </property> <property name="templateService"> <ref bean="TemplateService"/> </property> </bean> <bean id="runScript" class="org.alfresco.repo.action.scheduled.CronScheduledQueryBasedTemplateActionDefinition"> <property name="transactionMode"> <value>UNTIL_FIRST_FAILURE</value> </property> <property name="compensatingActionMode"> <value>IGNORE</value> </property> <property name="searchService"> <ref bean="SearchService"/> </property> <property name="templateService"> <ref bean="TemplateService"/> </property> <property name="queryLanguage"> <value>lucene</value> </property> <property name="stores"> <list> <value>workspace://SpacesStore</value> </list> </property> <property name="queryTemplate"> <value>PATH:"/app:company_home"</value> </property> <property name="cronExpression"> <!– In reality this will be once a day, this is just for testing –> <value>0 0/3 * * * ?</value> </property> <property name="jobName"> <value>jobD</value> </property> <property name="jobGroup"> <value>jobGroup</value> </property> <property name="triggerName"> <value>triggerD</value> </property> <property name="triggerGroup"> <value>triggerGroup</value> </property> <property name="scheduler"> <ref bean="schedulerFactory"/> </property> <property name="actionService"> <ref bean="ActionService"/> </property> <property name="templateActionModelFactory"> <ref bean="templateActionModelFactory"/> </property> <property name="templateActionDefinition"> <ref bean="runScriptAction"/> <!– This is name of the action (bean) that gets run –> </property> <property name="transactionService"> <ref bean="TransactionService"/> </property> <property name="runAsUser"> <value>System</value> </property> </bean></beans></blockcode>ageoff.js<blockcode>// I am testing with this date range because I am testing in a temporary 4.2.e instance.var temp = "NOW-1YEAR/DAY TO NOW/DAY+1DAY"// Real date range will be something like "MIN TO NOW-1YEAR/DAY"// This is kind of what I want, but it doesn't work. I think my query is somehow wrong. Also, I don't // think "@cm\\:accessed" exists, but when I replace it with "@cm\\:created" it doesn't seem to work anyway.var docs = search.luceneSearch("PATH:\"/app:company_home/app:user_homes//*\" AND @cm\\:accessed:[" + temp + "] AND TYPE:\"cm:content\" AND -TYPE:\"cm:folder\"");//———————————————————————————–//This will get a list of everything in user homes. This works! (but not what I want)//———————————————————————————–//var docs = search.luceneSearch("PATH:\"/app:company_home/app:user_homes//*\" AND TYPE:\"cm:content\" AND -TYPE:\"cm:folder\"");var dest;for(dest=0; dest < docs.length; dest++) { // Instead of remove I think I want to set the sys:temporary aspect? var success = docs[dest].remove();}</blockcode><strong>Question</strong>: Is there a way to query based on when the documents were accessed or viewed (even just viewed on the share site, not necessarily downloaded)? Google returns hits for cm:created and cm:modified, but not cm:accessed. I think this approach is somewhat dead for this reason.——————————————————————————–Approach #2——————————————————————————–I have this java class that can correctly find the files that have not been accessed in a year if I run it from the contentstore root directory (alf_data/contentstore). According to this page https://wiki.alfresco.com/wiki/Custom_Actions I believe I shouldn't have too hard of a time converting this class into a custom action.
import java.util.Map;
import java.lang.ProcessBuilder;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
// Find un-accessed files:
// find ./ -atime +366 -type f -exec ls -l –time=atime {} \;
//
public class AgeOff {
/**
*
*/
private static String loadStream(InputStream s) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(s));
StringBuilder sb = new StringBuilder();
String line;
while((line = br.readLine()) != null) {
sb.append(line).append("\n");
}
return sb.toString();
}
public static void main(String[] args) {
ProcessBuilder pb = new ProcessBuilder("/bin/bash", "-c", "find ./ -atime +366 -type f -exec ls -l –time=atime {} \\;");
try {
Process p = pb.start();
String output = loadStream(p.getInputStream());
String outerr = loadStream(p.getErrorStream());
System.out.println(output);
System.err.println("————-ERRORS————-");
System.err.println(outerr);
} catch(Exception e) {
System.out.println("EXCEPTION: " + e.getMessage());
}
}
}
Problem with this approach is that I now have a list of files in the contentstore and I need to somehow translate that to alfresco Nodes so that I can delete or set sys:temporary. Is there a way to translate a contentstore path to an alfresco node?———————————————————————–Summary———————————————————————–1) Is there a valid way to query for cm:accessed?2) Is there a way to translate a contentstore path/id to an alf_node? (I hope that is the correct terminology)I have looked into the Records Management module and it doesn't seem to be what I want. But maybe I am missing something.This post (https://forums.alfresco.com/forum/end-user-discussions/alfresco-share/automatically-deleting-documen...) is relevant, but I want accessed, not created.We have two alfresco instances running for different purposes, 4.2.e and 3.4.8. Ideally I need something that works for both, but I really just want some sort of push in the right direction. I have only tested the above with 4.2.e because it is community and I can spin up a temporary instance to test with so I don't touch production. So I suppose I would rather have help with 4.2.e if there is no common solution.OS is CentOS.