cancel
Showing results for 
Search instead for 
Did you mean: 

Bulk metadata update

efvrvo
Champ on-the-rise
Champ on-the-rise
I have searched on the forum for this but the results I have found are more concentrated on older versions of Alfresco for bulk file uploading or CMS migration. I use version 3.4d and to bulk upload files is not a problem but to bulk update metadata is a problem. I have seen various topics that use programs like Talend or OpenMigrade but it appears to me that they are for older versions of Alfresco and I do not really understand them.

I have seen that this program http://code.google.com/p/alfresco-bulk-filesystem-import/ can update the metadata for version 3.4 but I will have to create 1000 text files each containing custom metadata so I don't think this is going to help me much.

My requirement is in my opinion very simply. I have say 1000 drawings that I upload into Alfresco. I have an Excel spreadsheet that contains in the columns the metadata fields (eg Name, Title, description etc) and in the rows the doc names with each metadata field. I think its fairly simple to write a javascript to update the metadata but I would like the javascript to use the excel spreadsheet for the values.

I would like it to read the excel spreadsheet (I can make this a CSV file) and then use the NAME column to find the document and and the other columns to insert into the metadata fields. If someone can help me on how I can get the javascript to read the csv file and then show me how to handle the variables I think I will have a very useful metadata update tool for high volume files.

For example. Lets say
0)I upload 5 document into the alfresco repository under "sites/testSite/documentlibrary"
1) I create a CSV file and name it "ImportMetadata.csv"

document1.doc,Test document title 1,Test document desc 1
document2.doc,Test document title 2,Test document desc 2
document3.doc,Test document title 3,Test document desc 3
document4.doc,Test document title 4,Test document desc 4
document5.doc,Test document title 5,Test document desc 5

2) I upload the file in the alfresco repository in "data dictionary/temp"
3) I run my javascript and it reads "ImportMetadata.csv" and creates a 2 dimensional array using a variable called "metadata"
4) The javascript then accesses the array "metadata" and starts going through the rows
5) It takes "metadata[0,0]" to find the name of the first document ("document1.doc") and then searches for this document in the alfresco repository under "sites/testSite/documentlibrary"
6) Once the document is found it edits the document1 properties
7) It uses "metadata[0,1]" to update document.properties.title
😎 It uses "metadata[0,2]" to update document.properties.description

9) It takes "metadata[1,0]" to find the name of the second document ("document2.doc") and then searches for this document in the alfresco repository under "sites/testSite/documentlibrary"
10) Once the document is found it edits the document2 properties
11) It uses "metadata[1,1]" to update document.properties.title
12) It uses "metadata[1,2]" to update document.properties.description

And so it goes. I think the big trick is to parse the csv file

Anyone that could assist?
2 REPLIES 2

efvrvo
Champ on-the-rise
Champ on-the-rise
Here is how this was solved

First I added this very clever app that you can find here http://code.google.com/p/share-extras/wiki/JavascriptConsole
This app is not required but it gave me a nice admin interface so that I can execute the script. It also gave me the ability to define the "space" variable. But the script can be executed any any way obviously.

Second I found a code to read and parse the CSV file on this site http://www.bennadel.com/blog/1504-Ask-Ben-Parsing-CSV-Strings-With-Javascript-Exec-Regular-Expressio...
The script is written by Ben Nadel. He gave a very neat script to create a 2x2 matrix (array) as an output for the csv file.

Third I made a few minor adjustment to make the script by Ben Nadel so that it could work for my documents. The complete script that contains my minor input and the section written by Ben Nadel is listed below

var CSVFile = companyhome.childByNamePath("Data Dictionary/MetaData.csv");
var strData = CSVFile.content;
var strDelimiter = ",";

      // Create a regular expression to parse the CSV values.
      var objPattern = new RegExp(
         (
            // Delimiters.
            "(\\" + strDelimiter + "|\\r?\\n|\\r|^)" +

            // Quoted fields.
            "(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +

            // Standard fields.
            "([^\"\\" + strDelimiter + "\\r\\n]*))"
         ),
         "gi"
         );


      // Create an array to hold our data. Give the array
      // a default empty first row.
      var arrData = [[]];

      // Create an array to hold our individual pattern
      // matching groups.
      var arrMatches = null;


      // Keep looping over the regular expression matches
      // until we can no longer find a match.
      while (arrMatches = objPattern.exec( strData )){

         // Get the delimiter that was found.
         var strMatchedDelimiter = arrMatches[ 1 ];

         // Check to see if the given delimiter has a length
         // (is not the start of string) and if it matches
         // field delimiter. If id does not, then we know
         // that this delimiter is a row delimiter.
         if (
            strMatchedDelimiter.length &&
            (strMatchedDelimiter != strDelimiter)
            ){

            // Since we have reached a new row of data,
            // add an empty row to our data array.
            arrData.push( [] );

         }


         // Now that we have our delimiter out of the way,
         // let's check to see which kind of value we
         // captured (quoted or unquoted).
         if (arrMatches[ 2 ]){

            // We found a quoted value. When we capture
            // this value, unescape any double quotes.
            var strMatchedValue = arrMatches[ 2 ].replace(
               new RegExp( "\"\"", "g" ),
               "\""
               );

         } else {

            // We found a non-quoted value.
            var strMatchedValue = arrMatches[ 3 ];

         }


         // Now that we have our value string, let's add
         // it to the data array.
         arrData[ arrData.length - 1 ].push( strMatchedValue );
      }


print(arrData);

for (var i=0; i<arrData.length; i++)
   {
          var fileName = arrData[i][0];
          var targetFile = space.childByNamePath(fileName);
          if (targetFile != null)
          {           
            targetFile.properties.title = arrData[i][1];
            targetFile.properties["dst:bhuRef"] = arrData[i][2];
            targetFile.properties["dst:clientRef"] = arrData[i][3];
            targetFile.properties["dst:shtVol"] = arrData[i][4];
            targetFile.properties["dst:rev"] = arrData[i][5];       
            targetFile.save();         
          }
        } 

The way the script works is as follows:
1) It loads a CSV file that I store under "data dictionary" names "MetaData.csv"
CSVFile = companyhome.childByNamePath("Data Dictionary/MetaData.csv");
var strData = CSVFile.content;
2) Then Ben Nadel's code takes over and parses the CSV file with an output "arrData" that is a 2x2 array
3) The array is then used to find the target files in the "space" variable as defined by the javascript console app is used to find the target files
for (var i=0; i<arrData.length; i++)
   {
          var fileName = arrData[0];
          var targetFile = space.childByNamePath(fileName);
4) The I simply use the array to update the desired metadata. In my case it is custom data I added
if (targetFile != null)
          {           
            targetFile.properties.title = arrData[1];
            targetFile.properties["dst:bhuRef"] = arrData[2];
            targetFile.properties["dst:clientRef"] = arrData[3];
            targetFile.properties["dst:shtVol"] = arrData[4];
            targetFile.properties["dst:rev"] = arrData[5];       
            targetFile.save();         
          }

rachana
Champ in-the-making
Champ in-the-making
Hi efvrvo

Your solution very well for me. Thanks allot.