01-13-2020 10:27 AM
We have imported a lot of documents into one site of our Alfresco instance with the bulk import tool. The import failed a few times and we had to manually update the metadata xml file to fix the issue that caused the import to fail and then start the import tool again with the skip existing nodes option.
It worked in the end, but unfortunately we found a lot of documents that was not imported correctly. From what we observed, all files that failed to import share the same symptoms:
Our first step would be to find all nodes with these properties so that we can estimate how many files are affected. The second step would be to find a (automated) way to re-import the missing files.
The first try was based on the search form in Alfresco Share and we tried to modify the XHR request the search form submits but we were not able to filter for unset / none attributes.
What would be the best option to find the nodes with the properties shown above?
Would it be possible to use the Node-Browser in the Admin-Tools section to find all nodes with a not existing data node or a node with the size of 0 Bytes and no MIME-type etc.?
Every hint would be greatly appreciated.
With kind regards
Andreas
01-13-2020 01:28 PM
You can either use node browser or write a webscript to find the nodes by property value, where property could be "cm:title" and value could be null. Same way you can search for nodes by other properties you have mentioned.
If the size is missing for a node, i would first check if the node has content or not. because if content was uploaded to the node successfully then it should have size property. Could be possible that the node was created but content could not be uploaded. It depends on the method you used to import. Re-importing the files having 0kb size may fix the issue with size and mimetype both.
A sample search query looks like:
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@content.mimetype"
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@cm\:title"
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@cm\:description" etc.
You can use AND/OR query in the search query if you want to find nodes having all listed properties with null value.
e.g:
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND (ISNULL:"@cm\:description" OR ISNULL:"@cm\:title" OR ISNULL:"@content.mimetype") If you know the folder path in documentLibrary of yoursite, then you can include that as well. e.g: PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary/cm:Import/cm:Images//*" AND (TYPE:"cm:content") AND (ISNULL:"@cm\:description" OR ISNULL:"@cm\:title" OR ISNULL:"@content.mimetype") Here: Imported files are under Site > DocumentLibrary > Import > Images folder And "testsite" is the shortname of a site where you have imported the files.
You can also change the PATH value depending on the place of your imported files.
You can exeucte this in node browser. Use the nodebrowser from alfresco app (<host>:<post>/alfresco/s/admin/admin-nodebrowser) where you have option to request the result size (Max Results/Skip Count options). Share UI based node browser always returns 1000 nodes only.
Here is a sample webscript to find properties having empty or null values:
main(); function main(){ //Param 'propertyName', e.g.: cm:title var propertyName = args["propertyName"]; //Param 'existingPropertyVal', e.g.: "test" var existingPropertyVal = args["existingPropertyVal"]; //Param 'objectTypes', e.g.: cm:content (this could , separated content types if you have created any custom content types) var objectTypes = args["objectTypes"]; //Param 'siteShortName', e.g.: testsite var siteShortName = args["siteShortName"]; //Param 'folderPath', e.g.: a folder path in document library if there is any var folderPath = args["folderPath"]; var skipCount = (args["skipCount"]==null || args["skipCount"]==undefined)?0:args["skipCount"]; var maxCount = (args["maxCount"]==null || args["maxCount"]==undefined)?100000:args["maxCount"]; var resultedNodes = []; var query = buildQuery (siteShortName, objectTypes, folderPath, propertyName, existingPropertyVal) var page = { skipCount : parseInt(skipCount), maxItems : parseInt(maxCount) }; var searchQuery = { query : query, language : "fts-alfresco", page : page }; logger.log("Executing SearchQuery: "+query) var nodes = search.query(searchQuery); logger.log("Total Nodes: "+nodes.length) for each(node in nodes) { resultedNodes[node.nodeRef] = node.name; } model.resultedNodes = resultedNodes; } function buildQuery(siteShortName, objectTypes, folderPath, propertyName, existingPropertyVal) { var query = 'PATH:"/app:company_home/st:sites/cm:' +siteShortName; if(folderPath=="") { query = query + '/cm:documentLibrary//*"'; } else { query = query + '/cm:documentLibrary/'; var pathTokens = folderPath.split('/'); for (var each=0; each<pathTokens.length; each ++) { query = query +'cm:'+search.ISO9075Encode(pathTokens[each].trim())+'/'; } query = query + '/*"'; } var objectTypeArr = objectTypes.split(','); var arrayLength = objectTypeArr.length; if(arrayLength == 1) { query = query +' AND (TYPE:"' +objectTypeArr+'")'; } else { query = query +' AND ('; for (var each=0; each<arrayLength; each ++) { query = query + 'TYPE:"'+objectTypeArr[each].trim()+'"'; if(each != arrayLength-1) { query = query + ' OR '; } } query = query +')'; } //Append property and value query. if(!!existingPropertyVal) {//existing property value var propertyParts = propertyName.split(':'); if (propertyParts.length ==1 && (propertyName=="mimetype" || propertyName=="size")) { query = query +' AND @content.'+propertyName+':"'+existingPropertyVal+'"'; } else { query = query +' AND @'+propertyParts[0]+'\\:'+propertyParts[1]+':"'+existingPropertyVal+'"'; } } else {//existing property is null var propertyParts = propertyName.split(':'); if (propertyParts.length ==1 && (propertyName=="mimetype" || propertyName=="size")) { query = query +' AND ISNULL:"@content.'+propertyName+'"'; } else { query = query +' AND ISNULL:"@'+propertyParts[0]+'\\:'+propertyParts[1]+'"'; } } return query; }
You can also take a look at: https://hub.alfresco.com/t5/alfresco-content-services-forum/search-for-nodes-that-do-not-have-a-prop...
I am not sure which import tool you used but consider taking a look at this tool:
https://github.com/pmonks/alfresco-bulk-import/wiki
https://github.com/pmonks/alfresco-bulk-import/wiki/Usage
01-13-2020 01:28 PM
You can either use node browser or write a webscript to find the nodes by property value, where property could be "cm:title" and value could be null. Same way you can search for nodes by other properties you have mentioned.
If the size is missing for a node, i would first check if the node has content or not. because if content was uploaded to the node successfully then it should have size property. Could be possible that the node was created but content could not be uploaded. It depends on the method you used to import. Re-importing the files having 0kb size may fix the issue with size and mimetype both.
A sample search query looks like:
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@content.mimetype"
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@cm\:title"
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND ISNULL:"@cm\:description" etc.
You can use AND/OR query in the search query if you want to find nodes having all listed properties with null value.
e.g:
PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary//*" AND (TYPE:"cm:content") AND (ISNULL:"@cm\:description" OR ISNULL:"@cm\:title" OR ISNULL:"@content.mimetype") If you know the folder path in documentLibrary of yoursite, then you can include that as well. e.g: PATH:"/app:company_home/st:sites/cm:testsite/cm:documentLibrary/cm:Import/cm:Images//*" AND (TYPE:"cm:content") AND (ISNULL:"@cm\:description" OR ISNULL:"@cm\:title" OR ISNULL:"@content.mimetype") Here: Imported files are under Site > DocumentLibrary > Import > Images folder And "testsite" is the shortname of a site where you have imported the files.
You can also change the PATH value depending on the place of your imported files.
You can exeucte this in node browser. Use the nodebrowser from alfresco app (<host>:<post>/alfresco/s/admin/admin-nodebrowser) where you have option to request the result size (Max Results/Skip Count options). Share UI based node browser always returns 1000 nodes only.
Here is a sample webscript to find properties having empty or null values:
main(); function main(){ //Param 'propertyName', e.g.: cm:title var propertyName = args["propertyName"]; //Param 'existingPropertyVal', e.g.: "test" var existingPropertyVal = args["existingPropertyVal"]; //Param 'objectTypes', e.g.: cm:content (this could , separated content types if you have created any custom content types) var objectTypes = args["objectTypes"]; //Param 'siteShortName', e.g.: testsite var siteShortName = args["siteShortName"]; //Param 'folderPath', e.g.: a folder path in document library if there is any var folderPath = args["folderPath"]; var skipCount = (args["skipCount"]==null || args["skipCount"]==undefined)?0:args["skipCount"]; var maxCount = (args["maxCount"]==null || args["maxCount"]==undefined)?100000:args["maxCount"]; var resultedNodes = []; var query = buildQuery (siteShortName, objectTypes, folderPath, propertyName, existingPropertyVal) var page = { skipCount : parseInt(skipCount), maxItems : parseInt(maxCount) }; var searchQuery = { query : query, language : "fts-alfresco", page : page }; logger.log("Executing SearchQuery: "+query) var nodes = search.query(searchQuery); logger.log("Total Nodes: "+nodes.length) for each(node in nodes) { resultedNodes[node.nodeRef] = node.name; } model.resultedNodes = resultedNodes; } function buildQuery(siteShortName, objectTypes, folderPath, propertyName, existingPropertyVal) { var query = 'PATH:"/app:company_home/st:sites/cm:' +siteShortName; if(folderPath=="") { query = query + '/cm:documentLibrary//*"'; } else { query = query + '/cm:documentLibrary/'; var pathTokens = folderPath.split('/'); for (var each=0; each<pathTokens.length; each ++) { query = query +'cm:'+search.ISO9075Encode(pathTokens[each].trim())+'/'; } query = query + '/*"'; } var objectTypeArr = objectTypes.split(','); var arrayLength = objectTypeArr.length; if(arrayLength == 1) { query = query +' AND (TYPE:"' +objectTypeArr+'")'; } else { query = query +' AND ('; for (var each=0; each<arrayLength; each ++) { query = query + 'TYPE:"'+objectTypeArr[each].trim()+'"'; if(each != arrayLength-1) { query = query + ' OR '; } } query = query +')'; } //Append property and value query. if(!!existingPropertyVal) {//existing property value var propertyParts = propertyName.split(':'); if (propertyParts.length ==1 && (propertyName=="mimetype" || propertyName=="size")) { query = query +' AND @content.'+propertyName+':"'+existingPropertyVal+'"'; } else { query = query +' AND @'+propertyParts[0]+'\\:'+propertyParts[1]+':"'+existingPropertyVal+'"'; } } else {//existing property is null var propertyParts = propertyName.split(':'); if (propertyParts.length ==1 && (propertyName=="mimetype" || propertyName=="size")) { query = query +' AND ISNULL:"@content.'+propertyName+'"'; } else { query = query +' AND ISNULL:"@'+propertyParts[0]+'\\:'+propertyParts[1]+'"'; } } return query; }
You can also take a look at: https://hub.alfresco.com/t5/alfresco-content-services-forum/search-for-nodes-that-do-not-have-a-prop...
I am not sure which import tool you used but consider taking a look at this tool:
https://github.com/pmonks/alfresco-bulk-import/wiki
https://github.com/pmonks/alfresco-bulk-import/wiki/Usage
01-14-2020 06:44 PM
Thank you very much @abhinavmishra14,
based on your WebScript code I was able to build a query that seems to deliver the results I need.
I have accepted your answer as a solution.
Thanks again!
With kind regards
Andreas
Explore our Alfresco products with the links below. Use labels to filter content by product module.