Obsolete Pages{{Obsolete}}
The official documentation is at: http://docs.alfresco.com
{{AVMWarning}}
AVM
Searching against a WCM store is the same as searching against an ADM store, as described on the Search page. The search API is the same. Some index fields are modified as described below, these are mostly only intended for internal use. PATH is the major exception which matches paths in the avm store, where there is no name space prefix. Paths are like '/a/b/c/d'.
Lucene based search is only available for the latest snapshot of the head revision for staging. It is not available for user sand boxes, workflow sandboxes etc.
XPath based searching is available for all WCM stores, with the caveat that performance may be slow depending on the query and the store structure, as the implementation walks the object model. This follows the same implementation as the ADM stores. XPath searching is always available for all stores and requires no configuration. Searches are always live, and not just available for the last snapshot.
A StoreRef is sufficient to identify the store for the search.
A search against an avm store can be performed via Java, JavaScript, Freemaker template or the node browser. It is not yet exposed via open search. As the API is the same, searches via web services etc are the same apart from the store context.
Indexing takes place for a limited number of avm actions. These are:
All other actions have no effect on the index. So new additions to a wcm store will not be found until a snapshot is taken. If the snapshot is indexed asynchronously then the addition will not be found until the snapshot has been indexed in the background.
For synchronous indexing, search works immediately after any snapshot. After the snapshot any searches in the same transaction will find the changes in the snapshot. After the snapshot is committed all users will see the changes.
Indexing of a snapshot may take place asynchronously or synchronously. AVM store creates and purges are indexed synchronous. Move is a mixed beast - the index for the old store is synchronously deleted, the new index is synchronously created and the initial snapshot may be indexed asynchronously or synchronously, depending on the configuration.
All other fields are as described in |Fields in the index and how they are exposed for queries
Configuration is set in a method interceptor bean definition that wraps AVM calls. This is defined by default in public-service-context.xml and can be over-ridden. The default configuration is shown below. By default staging areas are indexed synchronously. There is an extension example for asynchronous indexing. Remember, if you configure asynchronous indexing your query results may be out of date.
<bean id='avmSnapShotTriggeredIndexingMethodInterceptor' class='org.alfresco.repo.search.AVMSnapShotTriggeredIndexingMethodInterceptor'>
<property name='avmService'>
<ref bean='avmService' />
</property>
<property name='indexerAndSearcher'>
<ref bean='avmLuceneIndexerAndSearcherFactory' />
</property>
<property name='enableIndexing'>
<value>true</value>
</property>
<property name='defaultMode'>
<value>SYNCHRONOUS</value>
</property>
<property name='indexingDefinitions'>
<list>
<value>SYNCHRONOUS:TYPE:STAGING</value>
<value>UNINDEXED:TYPE:STAGING_PREVIEW</value>
<value>UNINDEXED:TYPE:AUTHOR</value>
<value>UNINDEXED:TYPE:AUTHOR_PREVIEW</value>
<value>UNINDEXED:TYPE:WORKFLOW</value>
<value>UNINDEXED:TYPE:WORKFLOW_PREVIEW</value>
<value>UNINDEXED:TYPE:AUTHOR_WORKFLOW</value>
<value>UNINDEXED:TYPE:AUTHOR_WORKFLOW_PREVIEW</value>
<value>ASYNCHRONOUS:NAME:avmAsynchronousTest</value>
<value>SYNCHRONOUS:NAME:.*</value>
</list>
</property>
</bean>
The AVMSnapShotTriggeredIndexingMethodInterceptor class supports querying the index state if you want to know if an asynchronous index is up to date. See the java doc.
(SYNCHRONOUS | ASYNCHRONOUS | UNINDEXED): (TYPE | NAME) : regular expression
Each entry defines a regular expression that is used against either the name of the store or the WCM UI store type. Each entry is tried from first to last, the first match defines the indexing mode.
In the definition above stores of type staging area are synchronously indexed. All other types of store used by the WCM UI are unindexed. This must not be changed as it is not yet supported. The store named avmAsynchronousTest, used in testing, is indexed asyncronously. All other stores are indexed synchronously. The default entry will never be used as a catch-all regular expression '.*' is at the end of the list.
Synchronous indexing indexes everything - including content. There is no way to index meta data synchronously and content asynchronously. Asynchronous indexing indexes nothing (no meta data, nothing). It creates an index request which indexes the snapshot in the background at some later date.
If meta data is extracted from XML and stored in an attribute then you can search for it. There will also be a full text conversion of the xml to text.
See XML Metadata Extractor Configuration for WCM
The main limitation is for repeated elements mapping to one attribute as repeating values. Attributes do not support any position queries (by the normal API). So if you have something like
<a>
<c>1</c>
<d>2</d>
<c>3</c>
<d>4</d>
</a>
and this get pulled into an aspect of type {test}A with attribute {test}b.c as [1,3] and {test}b.d as [2,4] then a query of the form
+@{test}b.c:'1' +@{test}b.d:'4'
will find a match where it may be unexpected.
Lucene only supports basic PATH look up. There are no built-in aggregation functions etc.
XPath (V1.0) can provide this to some extent
If you want data manipulation or to process the result set, you have to do this in JavaScript or in a template.
When a snapshot is made of a store then the changed nodes in the store make up a new overlay index.
A revert to a previous snapshot will be treated in the same way - there is no need to store overlays and be able to roll back - although this could be more performant.
PATH is used as the node ID as well as the PATH and to determine which files etc are overlayed in the index.
The store information is ignored. This assumes that all stores are rooted at the same point. This is true in practice in the WCM world but does not have to be the case. An overlay at the root of one store could point to a sub tree of another store. In the first case we will ignore this. There is a node id (the DB long id) which we should user here - suitably encoded in the index as we do for other longs.
Hooked into snapshot for a store, may be on demand for an author's store.
Store types are available to support this.
Indexing is done at the store level for each snap shot of the store. In the first instance, only the latest snapshot will be indexed.
As only the latest snap shot is required, the overlays can be merged up into one big overlay that can be applied over other stores. The index will only contain information for the store - nothing about layers above or below.
At search time the indexes for the stores are overlayed. The deletion list needs to be kept for the overlay as this will involve many base indexes etc. If we know the store is not layered on any other store we could throw away the list of overlayed paths as it will never be used. This is the basis for the first implementation - we will only worry about this base index o fhte staging area.
XML content will not be generally searchable. Metadata will be extracted to suitable aspects depending on the content of
the XML data. The metadata in these aspects will be indexed according to the DD definitions.
When is this transform done?
As we have no actions then we could just extract to pseudo attributes for search only (indexed but do not exist).
It may be worth adding navidation into XML docs types in the XPath navigator. Which would give a slow but full xpath search and a reasonable API for in document search....
Metadata extraction will have to be done at index time of required.
Overlay to support a generic ID (remove noderef from index API)
Support overlay types in the index - which always keep the deletion list for overlaying indexes.
Do we index per store and merge? Or, add overlays to the index? Prefer the first.
Support to merge overlays (useful in any case) Keep the deletion list. Can be used to merge index deltas.
Index AVM nodes (minor change to existing with Node and Content facades)
Interceptor to index snapshots. Find the nodes unique to the store changed since the last snapshot. Index them as a chunk.
Build index overlays at search time.
AVM does not have 'Index' or 'Delta' layers they are all 'Overlay'
Target to have one overlay per store for background merging of overlays.
Could maintain some history of overlays to support point in time searching. Overlays have a numeric incrementing id.
Thread pooling of background threads as the number of mini indexes will sky rocket
Only index stores of type something like staging and authoring
XML Extraction.