cancel
Showing results for 
Search instead for 
Did you mean: 

metadata extraction

nancyaggarwal
Champ in-the-making
Champ in-the-making
Hi All

can anyone help me with the code for extracting the modified date of a document in document library of alfresco.

Thanks
Nancy
9 REPLIES 9

muralidharand
Star Contributor
Star Contributor
You can use nodeservice to read the properties of a node.
The below one might work.
nodeService.getProperty("<YOUR NODEREF>", "PROPERTY_NAME");

rjohnson
Star Contributor
Star Contributor
You can use the standard REST or CMIS API to get all the metatdata for a document provided you know its nodeRef or path.

Can you give me a little more detail about what you are trying to achieve and the starting point? If you can I should be able to assist you better.

Bob

Thanks fro your reply. I want to extract the modified date of the document and then compare it with the current date, if it is older than six months then i want that document to be moved to another another store.

Nancy

rjohnson
Star Contributor
Star Contributor
On the basis that you want to move your document to a different place within the same repository then I think what you need is a scheduled job that runs on a periodic (daily?) basis and finds all the documents whos modified date is over 6 months ago and then executes a move.

Alfresco has a job scheduler and you could therefore write some javascript to do this and have the scheduler execute that script each day.

Alternatively you could use the Apache Chemistry CMIS libraries and execute a job based on cron (or widows scheduler) that can do the same thing in which case you can use any language that Chemistry has libraries for.

In Javascript you would be looking to execute a lucene search that would find all qualifying documents. I can't give you the full search but the bit looking at the modified date would be something like @cm\:modified:[MIN TO "2013-01-01T00:00:00.000Z"] the last bit is 6 months ago expressed as an ISO8601 date.

In CMIS the query is more SQL like

"select * from cm:content where cmis:cmis:lastModificationDate >= TIMESTAMP '" . date("Y-m-d", $gtthan) ."T00:00:00.000+00:00' and cmis:cmis:lastModificationDate &lt; TIMESTAMP '". date("Y-m-d", $ltthan) . "T00:00:00.000+00:00'"

Once you get your query results back you can move the document to wherever you want.

Two things to be aware of is that such queries limit their return to 1000 results so if you have a lot of documents qualifying you may need to run this script more than once per day.

Also beware transaction size limits. If you are using Javascript I think you will have to iterate through your search results create actions on each document to move it otherwise you will run into a transaction size limits.

Bob

Thanks Bob for your reply. But if i want to move the documents to a  place that is outside the repository then what is the solution? I have written javascript code and one scheduled action for this but still my documents are not moving.

Below is my javascript code and my scheduled-action-services-context.xml



scheduled-action-services-context.xml

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>

<!–
Define the model factory used to generate object models suitable for use with freemarker templates.
–>
<bean id="templateActionModelFactory" class="org.alfresco.repo.action.scheduled.FreeMarkerWithLuceneExtensionsModelFactory">
<property name="serviceRegistry">
<ref bean="ServiceRegistry"/>
</property>
</bean>

<!– run the script–>
<bean id="runscript" class="org.alfresco.repo.action.scheduled.CronScheduledQueryBasedTemplateActionDefinition">

<property name="transactionMode">

<value>ISOLATED_TRANSACTIONS</value>

</property>

<property name="compensatingActionMode">

<value>IGNORE</value>

</property>

<property name="searchService">

<ref bean="SearchService"/>

</property>

<property name="templateService">

<ref bean="TemplateService"/>

</property>

<property name="queryLanguage">

<value>lucene</value>

</property>

<property name="stores">

<list>

<value>workspace://SpacesStore</value>

</list>

</property>

<property name="queryTemplate">

<value>PATH:"/app:company_home/st:sites"</value>

</property>

<property name="cronExpression">

<value>0 0/15 * * * ?</value>

</property>

<property name="jobName">

<value>jobA</value>

</property>

<property name="jobGroup">

<value>jobGroup</value>

</property>

<property name="triggerName">

<value>triggerA</value>

</property>

<property name="triggerGroup">

<value>triggerGroup</value>

</property>

<property name="scheduler">

<ref bean="schedulerFactory"/>

</property>

<property name="actionService">

<ref bean="ActionService"/>

</property>

<property name="templateActionModelFactory">

<ref bean="templateActionModelFactory"/>

</property>

<property name="templateActionDefinition">

<ref bean="moveoldfiles"/>

</property>

<property name="transactionService">

<ref bean="TransactionService"/>

</property>

<property name="runAsUser">

<value>System</value>

</property>

</bean>

<!–Execute the script–>

<bean id="moveoldfiles" class="org.alfresco.repo.action.scheduled.SimpleTemplateActionDefinition">

<property name="actionName">

<value>script</value>

</property>

<property name="parameterTemplates">

<map>

<entry>

<key>

<value>script-ref</value>

</key>

<value>\$\{selectSingleNode('workspace://SpacesStore', 'lucene', 'PATH:"/app:company_home/app:dictionary/app:scripts/cm:archive.js"' )\}</value>

</entry>

</map>

</property>

<property name="templateActionModelFactory">

<ref bean="templateActionModelFactory"/>

</property>

<property name="dictionaryService">

<ref bean="DictionaryService"/>

</property>

<property name="actionService">

<ref bean="ActionService"/>

</property>

<property name="templateService">

<ref bean="TemplateService"/>

</property>

</bean>
</beans>


my javascript

function addMonths(date, months) {
date.setMonth(date.getMonth() + months);
return date;
}

var docs = search.luceneSearch("PATH:\"/app:company_home/st:sites//*\" AND @cm\\:modified:[addMonths(new Date(), -6)]");

for(var dest : docs)
dest.move(C:\Alfresco\alf_data\archive)


Please help.

Thanks
Nancy

can anyone help?

rjohnson
Star Contributor
Star Contributor
From the code above and what you have said, I now understand that you want to take the document out of Alfresco and write it to a file system. That is not something I have ever tried and I suspect you will need to create a Java bean to do this or use external script(s), CMIS, a WebScript and cron. I don't think Javascript can do this and documen.move will not move a document to an arbitrary file system.

What you need to do is get the content of the document and write the content to a new file and then delete the document (note that when you do this the document goes into the trash in Alfresco so you will need a trash cleaner as well - there is one on the internet for 3.4 which should not be to hard to make work in 4.0+)

You can get at the document content in Javascript but you cannot write it to an external file system so far as I am aware.

If you were to go the CMIS route then what you need to do is set up a script to extract all the documents over 6 months old, then loop through that result set and call your webscript that will return the document content and then write that content to your file system. Once successfully writen you can then delete the document from Alfresco within your CMIS script.

The webscript to return content is a special type of webscript which is defined slightly differently from the norm. Your {webscript}.get.desc.xml will look something like the example below


<webscript kind="org.alfresco.repository.content.stream">
   <shortname>Show Document</shortname>
   <description>
      Returns the content for a supplied node.
   </description>
    <url>/farthest-gate/get-pdf-rendition</url>
   <format default="html">extension</format>
   <authentication runas="system">none</authentication>
   <transaction>required</transaction>
    <family>Farthest Gate</family>
</webscript>


and the js code will look something like this


function main(){
    var nodeRef = args.noderef;
    // get the node from the reference
    if(nodeRef) {
       var doc = search.findNode(nodeRef);
       if (doc)
       {
          model.contentNode = doc;
       }
       else
       {
          status.code = 400;
          status.message = "Node " + nodeRef + " does not exist";
          status.redirect = true;
       }
    } else {
      status.code = 404;
      status.message = "Noderef not given";
      status.redirect = true;       
    }
}

main();



That should do the trick.

Bob Johnson

Thanks Bob for your great reply. I'l definitely try this.

But if i want to make an archive folder in repository and wants to move the documents to it which are older than six months then still the above scheduled action and javascript is not working. Please tell me what is wrong in that.

Nancy

rjohnson
Star Contributor
Star Contributor
Nancy

Its your move command for sure


dest.move(C:\Alfresco\alf_data\archive)


isn't going to work. You have to use a noderef (which needs to be a folder) for the move target.

So, lets assume that you have a folder in your repository called archive.

First get the noderef for that folder and the quickest way to do that is to search for it. Once you have it, you move your document to that folder noderef. If you assume your document reference is in a variable called "document" I think that the code below should do what you want.


var query = '+PATH:"/app:company_home/cm:archive" +TYPE:"cm:folder"';
var queryDef = {
  query: query,
  language: "lucene"
};
var qResult = search.query(queryDef);
if (qResult.length == 1) { // If we had more that 1 returned, we have more than 1 archive folder which is a problem
  document.move(qResult[0].nodeRef);
}


Happy coding

Bob