Hyland Connect

mark_smithson · ‎07-31-2007

I am trying to implement a search against WCM information and am almost there.

I have managed to get the content indexed to properties on a custom aspect, using the XPathMetaDataExtractor. I managed to verify this using Luke.

When I try and run a search using the web client or via the javascript search API, the content is not returned from searches.

After some digging, I have found that I can call search.setStoreUrl("avm://website") and then my queries return results, e.g

var nodes = search.luceneSearch("@\\{http\\://www.mysite.com/alfresco\\}answer:" + args.q);

However, if I then run a different search using the web scripts which does not call search.setStoreUrl(), the search still seems to be executed against the avm store indexes. i.e. the storeUrl is not reset to default each javascript is executed, implying that the search object is shared. Therefore setting the storeUrl could cause concurrency issues, obviously I want to avoid that.

Is there another way to do this, or will I need to produce my own implementation of the org.alfresco.repo.jscript.Search class and register this with the script API?

kevinr · ‎08-06-2007

As each AVM store is indexed separately from the main DM store, the search API has been made available on each store object individually. The following example is taken from the JavaScript API examples section:


Example using the AVM API to process a webproject store - the store name is passed as store on the url arguments:

if (args["store"] == null)
{
   logger.log("ERROR: 'store' argument not specified.");
}
else
{
   main();
}

function main()
{
   var storeRootNode = avm.lookupStoreRoot(args["store"]);
   if (storeRootNode != null)
   {
      var path = storeRootNode.path + "/ROOT/admin/index.html";
      var node = avm.lookupNode(path);
      if (node == null)
      {
         return "ERROR: unable to find path: " + path;
      }
      
      var store = avm.lookupStore(args["store"]);
      if (store == null)
      {
         return "ERROR: unable to lookup store: " + args["store"];
      }
      var rootNode = store.lookupRoot();
      if (rootNode == null)
      {
         return "ERROR: unable to find root node for store: " + store.name;
      }
      
      var out = "";
      var results = store.luceneSearch("TEXT:tomcat");
      for (var i=0; i<results.length; i++)
      {
         out += results[i].path + "<br>";
      }
      return out;
   }
}
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

See: http://wiki.alfresco.com/wiki/JavaScript_API#AVM_Store_API

The setStoreUrl() method should not be called by your javascript at all, it is not mentioned in the JavaScript API docs. This method is only public because it has to be made available to the Spring container that instantiates the bean and sets various shared properties.

Thanks,

Kevin

mark_smithson · ‎08-06-2007

Thanks Kevin

That makes a lot more sense.

Regards

Mark

thaiyal · ‎08-06-2007

I wrote a webscript to search for xml files within the a WCM as described in the Java script examples section and it doesn't return any result.

I am using Alfresco enterprise version 2.1. Here is a my webscript…

script:
{
   // extract avm store id and path
   var path = url.extension.split("/");
   if (path.length != 3) {
      status.code = 400;
      status.message = "Invalid arguments.";
      status.redirect = true;
      break script;
   }
   var storeid = path[0];
   var type = path[1];
   var term = path[2];

   // locate avm node from path
   var store = avm.lookupStore(storeid);
   if (store == undefined) {
      status.code = 404;
      status.message = "Store " + storeid + " not found.";
      status.redirect = true;
      break script;
   }

   // get folder where xml files are
   var folder = store.lookupNode("/ROOT/WEB-INF/xml/" + type);
   if (folder == undefined) {
      status.code = 404;
      status.message = "Type " + type + " not found.";
      status.redirect = true;
      break script;
   }

   var nodes = store.luceneSearch("TEXT:"+term);

   model.storeid = storeid;
   model.type = type;
   model.term = term;
   model.folder = folder;
   model.nodes = nodes;
}

Should the XML fields be indexed separately or is there any additional configuration?

Thanks,
Thaiyal

thaiyal · ‎08-07-2007

Any updates??

mark_smithson · ‎08-09-2007

Your test xml files need to be in the staging sandbox as user sandboxes are not indexed

Also does the term you are searching for exist in the xml file.

Are you just getting no results, or are you getting errors?

thaiyal · ‎08-09-2007

Thanks!! It works for content in the staging sandbox, but I want it to index for content in the other sandbox too. Do you have any pointers as to where to modify the alfresco code for this?

mark_smithson · ‎08-10-2007

Not much I am afraid,

I have noticed that Alfresco seems to create index folders for the other sandboxes in the lucene-indexes/avm folder.

It is complicated by the layered folders features of web projects, perhaps why alfresco have not implemented it yet. I would suspect that the index in a user sandbox is a delta of the staging sandbox, but have not really looked into how Alfresco uses lucene for indexing. There seems to be a feature to include index deltas from the existing transaction in a search operation, so you may be abel to make use of that.

Most of the code seems to be in the package org.alfresco.repo.search.impl.lucene. Let me know if oyu make any progress.

andy · ‎08-13-2007

Hi

In general, indexing for layered stores is not so easy, otherwise we would indeed have done this already. It is complicated as, in general, layers include a sub set of nodes from another store, can change the path to these nodes and can change the names of nodes. There is also additional joy from branches.

Avoiding indexing a file and its contents more than once is quite tricky. We have ideas on how to get close to this but we have not gone into the detail to compare various possible implementations. I think there is little chance you can avoid duplicating some basic information about folder structure and you have to duplicate files if they change name (with the current index approach).

However, if you are going to give it a go, it would be much easier (and take longer) to index everything in its own right for each layer. Or simplfy the solution by asserting that layers never change paths or names and always include all nodes - as is the case for WCM. Here index overlays for snapshots follow the pattern used for transactional overlays. So you just need to work out what overlays to use and in what order. Even this is not as easy as it sounds and would require some additional information stored in the store indexes to track what is overlayed.

If no one wanted to search by PATH or PATH never changed life would be a lot easier!

In the first case I would try the XPath search - which does work against all stores - but some queries will not perform well.

Andy

thaiyal · ‎08-13-2007

Thanks, let me try the XPath query and see if it suits for my requirements.

Hyland Connect

WCM Search