cancel
Showing results for 
Search instead for 
Did you mean: 

How to find nodes with empty mimetype and unset/none file size?

apiening
Confirmed Champ
Confirmed Champ

We have imported a lot of documents into one site of our Alfresco instance with the bulk import tool. The import failed a few times and we had to manually update the metadata xml file to fix the issue that caused the import to fail and then start the import tool again with the skip existing nodes option.

It worked in the end, but unfortunately we found a lot of documents that was not imported correctly. From what we observed, all files that failed to import share the same symptoms:

  • File size = none
  • MIME-Type: Unknown
  • No Title
  • No Description
  • No Author

Our first step would be to find all nodes with these properties so that we can estimate how many files are affected. The second step would be to find a (automated) way to re-import the missing files.

The first try was based on the search form in Alfresco Share and we tried to modify the XHR request the search form submits but we were not able to filter for unset / none attributes.

What would be the best option to find the nodes with the properties shown above?
Would it be possible to use the Node-Browser in the Admin-Tools section to find all nodes with a not existing data node or a node with the size of 0 Bytes and no MIME-type etc.?

Every hint would be greatly appreciated.

With kind regards

Andreas

1 ACCEPTED ANSWER

abhinavmishra14
World-Class Innovator
World-Class Innovator

You can either use node browser or write a webscript to find the nodes by property value, where property could be "cm:title" and value could be null. Same way you can search for nodes by other properties you have mentioned. 

If the size is missing for a node, i would first check if the node has content or not. because if content was uploaded to the node successfully then it should have size property. Could be possible that the node was created but content could not be uploaded. It depends on the method you used to import. Re-importing the files having 0kb size may fix the issue with size and mimetype both.

A sample search query looks like: 

PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@content.mimetype"

PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@cm\:title"

PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@cm\:description" etc.
You can use AND/OR query in the search query if you want to find nodes having all listed properties with null value.

e.g:
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND (ISNULL:"@cm\:description" OR ISNULL:"@cm\:title" OR ISNULL:"@content.mimetype") If you know the folder path in documentLibrary of yoursite, then you can include that as well. e.g: PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary/cm:Import/cm:Images//*" AND (TYPE:"cm:content") AND (ISNULL:"@cm\:description" OR ISNULL:"@cm\:title" OR ISNULL:"@content.mimetype") Here: Imported files are under Site > DocumentLibrary > Import > Images folder And "testsite" is the shortname of a site where you have imported the files.

You can also change the PATH value depending on the place of your imported files.

You can exeucte this in node browser. Use the nodebrowser from alfresco app (<host>:<post>/alfresco/s/admin/admin-nodebrowser) where you have option to request the result size (Max Results/Skip Count options). Share UI based node browser always returns 1000 nodes only.

Here is a sample webscript to find properties having empty or null values:

main();

function main(){
    //Param 'propertyName', e.g.: cm:title
    var propertyName =  args["propertyName"];
    //Param 'existingPropertyVal', e.g.: "test"
    var existingPropertyVal = args["existingPropertyVal"];
   //Param 'objectTypes', e.g.: cm:content (this could , separated content types if you have created any custom content types) 
    var objectTypes = args["objectTypes"];
      //Param 'siteShortName', e.g.: testsite
	var siteShortName = args["siteShortName"];
   //Param 'folderPath', e.g.: a folder path in document library if there is any
    var folderPath = args["folderPath"];
	var skipCount = (args["skipCount"]==null || args["skipCount"]==undefined)?0:args["skipCount"];
	var maxCount = (args["maxCount"]==null || args["maxCount"]==undefined)?100000:args["maxCount"];
  	
	var resultedNodes = [];
  
    var query = buildQuery (siteShortName, objectTypes, folderPath, propertyName, existingPropertyVal)
	var page = {
			skipCount : parseInt(skipCount),
			maxItems : parseInt(maxCount)
	};

	var searchQuery = {
			query : query,
			language : "fts-alfresco",
			page : page
	};
	
 	logger.log("Executing SearchQuery: "+query)
 	var nodes = search.query(searchQuery);
 	logger.log("Total Nodes: "+nodes.length)
      
        for each(node in nodes) {
	   resultedNodes[node.nodeRef] = node.name;
	}
	model.resultedNodes = resultedNodes;
}

function buildQuery(siteShortName, objectTypes, folderPath, propertyName, existingPropertyVal) {
    var query = 'PATH:"/app:company_home/st:sites/cm:' +siteShortName;
  
    if(folderPath=="") {
       query = query + '/cm:documentLibrary//*"';
    } else {
       query = query + '/cm:documentLibrary/';
       var pathTokens = folderPath.split('/');
       for (var each=0; each<pathTokens.length; each ++) {
         query = query +'cm:'+search.ISO9075Encode(pathTokens[each].trim())+'/';
       }
       query = query + '/*"';
    }
  
    var objectTypeArr = objectTypes.split(',');
	var arrayLength = objectTypeArr.length;
	if(arrayLength == 1) {
	   query = query +' AND (TYPE:"' +objectTypeArr+'")';
	} else {
		query = query +' AND (';
		for (var each=0; each<arrayLength; each ++) {
			query = query + 'TYPE:"'+objectTypeArr[each].trim()+'"';
			if(each != arrayLength-1) {
			   query = query + ' OR ';
			}
		}
    		query = query +')';
	}
  
    //Append property and value query.
    if(!!existingPropertyVal) {//existing property value
        var propertyParts = propertyName.split(':');
        if (propertyParts.length ==1 && (propertyName=="mimetype" || propertyName=="size")) {
           query = query +' AND @content.'+propertyName+':"'+existingPropertyVal+'"';
        } else {
           query = query +' AND @'+propertyParts[0]+'\\:'+propertyParts[1]+':"'+existingPropertyVal+'"';
        }
    } else {//existing property is null
	var propertyParts = propertyName.split(':');
        if (propertyParts.length ==1 && (propertyName=="mimetype" || propertyName=="size")) {
           query = query +' AND ISNULL:"@content.'+propertyName+'"';
        } else {
           query = query +' AND ISNULL:"@'+propertyParts[0]+'\\:'+propertyParts[1]+'"';
        }
    }
    return query;
}

You can also take a look at: https://hub.alfresco.com/t5/alfresco-content-services-forum/search-for-nodes-that-do-not-have-a-prop...

I am not sure which import tool you used but consider taking a look at this tool: 

https://github.com/pmonks/alfresco-bulk-import/wiki

https://github.com/pmonks/alfresco-bulk-import/wiki/Usage

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

View answer in original post

2 REPLIES 2

abhinavmishra14
World-Class Innovator
World-Class Innovator

You can either use node browser or write a webscript to find the nodes by property value, where property could be "cm:title" and value could be null. Same way you can search for nodes by other properties you have mentioned. 

If the size is missing for a node, i would first check if the node has content or not. because if content was uploaded to the node successfully then it should have size property. Could be possible that the node was created but content could not be uploaded. It depends on the method you used to import. Re-importing the files having 0kb size may fix the issue with size and mimetype both.

A sample search query looks like: 

PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@content.mimetype"

PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@cm\:title"

PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@cm\:description" etc.
You can use AND/OR query in the search query if you want to find nodes having all listed properties with null value.

e.g:
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND (ISNULL:"@cm\:description" OR ISNULL:"@cm\:title" OR ISNULL:"@content.mimetype") If you know the folder path in documentLibrary of yoursite, then you can include that as well. e.g: PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary/cm:Import/cm:Images//*" AND (TYPE:"cm:content") AND (ISNULL:"@cm\:description" OR ISNULL:"@cm\:title" OR ISNULL:"@content.mimetype") Here: Imported files are under Site > DocumentLibrary > Import > Images folder And "testsite" is the shortname of a site where you have imported the files.

You can also change the PATH value depending on the place of your imported files.

You can exeucte this in node browser. Use the nodebrowser from alfresco app (<host>:<post>/alfresco/s/admin/admin-nodebrowser) where you have option to request the result size (Max Results/Skip Count options). Share UI based node browser always returns 1000 nodes only.

Here is a sample webscript to find properties having empty or null values:

main();

function main(){
    //Param 'propertyName', e.g.: cm:title
    var propertyName =  args["propertyName"];
    //Param 'existingPropertyVal', e.g.: "test"
    var existingPropertyVal = args["existingPropertyVal"];
   //Param 'objectTypes', e.g.: cm:content (this could , separated content types if you have created any custom content types) 
    var objectTypes = args["objectTypes"];
      //Param 'siteShortName', e.g.: testsite
	var siteShortName = args["siteShortName"];
   //Param 'folderPath', e.g.: a folder path in document library if there is any
    var folderPath = args["folderPath"];
	var skipCount = (args["skipCount"]==null || args["skipCount"]==undefined)?0:args["skipCount"];
	var maxCount = (args["maxCount"]==null || args["maxCount"]==undefined)?100000:args["maxCount"];
  	
	var resultedNodes = [];
  
    var query = buildQuery (siteShortName, objectTypes, folderPath, propertyName, existingPropertyVal)
	var page = {
			skipCount : parseInt(skipCount),
			maxItems : parseInt(maxCount)
	};

	var searchQuery = {
			query : query,
			language : "fts-alfresco",
			page : page
	};
	
 	logger.log("Executing SearchQuery: "+query)
 	var nodes = search.query(searchQuery);
 	logger.log("Total Nodes: "+nodes.length)
      
        for each(node in nodes) {
	   resultedNodes[node.nodeRef] = node.name;
	}
	model.resultedNodes = resultedNodes;
}

function buildQuery(siteShortName, objectTypes, folderPath, propertyName, existingPropertyVal) {
    var query = 'PATH:"/app:company_home/st:sites/cm:' +siteShortName;
  
    if(folderPath=="") {
       query = query + '/cm:documentLibrary//*"';
    } else {
       query = query + '/cm:documentLibrary/';
       var pathTokens = folderPath.split('/');
       for (var each=0; each<pathTokens.length; each ++) {
         query = query +'cm:'+search.ISO9075Encode(pathTokens[each].trim())+'/';
       }
       query = query + '/*"';
    }
  
    var objectTypeArr = objectTypes.split(',');
	var arrayLength = objectTypeArr.length;
	if(arrayLength == 1) {
	   query = query +' AND (TYPE:"' +objectTypeArr+'")';
	} else {
		query = query +' AND (';
		for (var each=0; each<arrayLength; each ++) {
			query = query + 'TYPE:"'+objectTypeArr[each].trim()+'"';
			if(each != arrayLength-1) {
			   query = query + ' OR ';
			}
		}
    		query = query +')';
	}
  
    //Append property and value query.
    if(!!existingPropertyVal) {//existing property value
        var propertyParts = propertyName.split(':');
        if (propertyParts.length ==1 && (propertyName=="mimetype" || propertyName=="size")) {
           query = query +' AND @content.'+propertyName+':"'+existingPropertyVal+'"';
        } else {
           query = query +' AND @'+propertyParts[0]+'\\:'+propertyParts[1]+':"'+existingPropertyVal+'"';
        }
    } else {//existing property is null
	var propertyParts = propertyName.split(':');
        if (propertyParts.length ==1 && (propertyName=="mimetype" || propertyName=="size")) {
           query = query +' AND ISNULL:"@content.'+propertyName+'"';
        } else {
           query = query +' AND ISNULL:"@'+propertyParts[0]+'\\:'+propertyParts[1]+'"';
        }
    }
    return query;
}

You can also take a look at: https://hub.alfresco.com/t5/alfresco-content-services-forum/search-for-nodes-that-do-not-have-a-prop...

I am not sure which import tool you used but consider taking a look at this tool: 

https://github.com/pmonks/alfresco-bulk-import/wiki

https://github.com/pmonks/alfresco-bulk-import/wiki/Usage

~Abhinav
(ACSCE, AWS SAA, Azure Admin)

Thank you very much @abhinavmishra14,

based on your WebScript code I was able to build a query that seems to deliver the results I need.

I have accepted your answer as a solution.

Thanks again!

With kind regards

Andreas