Overview
It is not a new norm that most of the customers are adapting to Alfresco Enterprise Search, and the requirements grow day by day. Therefore, this blog post is written to address the need of indexing content/metadata conditionally in a bigger repo.
Use Case:
Conditionally perform content indexing for documents for future uploads and existing documents for a group/subset of documents. Simply customer wants to exclude content indexing for doctype stt: statements ,however metadata should should be indexed.
Discussion: Traditionally, the IndexControl aspect has been used to restrict content or metadata indexing (see Control-Indexes for more details). However, this approach may not be suitable for all customer environments, particularly those with millions of data in existing repositories. As a result, there is a growing need to explore more efficient and viable alternatives."
Thanks to the Configuring Blacklist Sets feature in Alfresco Enterprise Search, it is now possible to define specific doctypes that should be excluded from indexing. Refer Configuring-Blacklist-Sets for more details. These blacklists can be specified in the file using the alfresco.mediation.filter-file attribute. The default file is called mediation-filter.yml that must be in the module classpath, see the sample content of that file:
mediation:
nodeTypes:
contentNodeTypes:
nodeAspects:
- sys:hidden
fields:
- cmis:changeToken
- alfcmis:nodeRef
- cmis:isImmutable
- cmis:isLatestVersion
- cmis:isMajorVersion
- cmis:isLatestMajorVersion
- cmis:isVersionSeriesCheckedOut
- cmis:versionSeriesCheckedOutBy
- cmis:versionSeriesCheckedOutId
- cmis:checkinComment
- cmis:contentStreamId
- cmis:isPrivateWorkingCopy
- cmis:allowedChildObjectTypeIds
- cmis:sourceId
- cmis:targetId
- cmis:policyText
- trx:password
- pub:publishingEventPayload
Where:
In our case we need to set the blacklisted doctypes to the contentNodeTypes attribute in yml file.
Solution Implementation
<?xml version="1.0" encoding="UTF-8"?>
<!-- Custom Model -->
<!-- Note: This model is pre-configured to load at startup of the Repository. So, all custom -->
<!-- types and aspects added here will automatically be registered -->
<model name="cr:financilaReport" xmlns="http://www.alfresco.org/model/dictionary/1.0">
<!-- Optional meta-data about the model -->
<description>Custom Model2</description>
<author></author>
<version>1.0</version>
<imports>
<!-- Import Alfresco Dictionary Definitions -->
<import uri="http://www.alfresco.org/model/dictionary/1.0" prefix="d"/>
<!-- Import Alfresco Content Domain Model Definitions -->
<import uri="http://www.alfresco.org/model/content/1.0" prefix="cm"/>
</imports>
<namespaces>
<namespace uri="cr.custom.model" prefix="cr"/>
</namespaces>
<types>
<type name="cr:financialReport">
<title>Financial Reports</title>
<parent>cm:content</parent>
<properties>
<property name="cr:vendorCode">
<title>vendorCode</title>
<description></description>
<type>d:text</type>
<mandatory>false</mandatory>
<multiple>false</multiple>
<index enabled="true">
<tokenised>both</tokenised>
</index>
</property>
</properties>
</type>
</types>
</model>
<?xml version="1.0" encoding="UTF-8"?>
<!-- Custom Model -->
<!-- Note: This model is pre-configured to load at sttartup of the Repository. So, all custtom -->
<!-- types and aspects added here will automatically be registtered -->
<model name="stt:statementsModel" xmlns="http://www.alfresco.org/model/statementsModel/1.0">
<!-- Optional meta-data about the model -->
<description>Custtom Model</description>
<author></author>
<version>1.0</version>
<imports>
<!-- Import Alfresco Dictionary Definitions -->
<import uri="http://www.alfresco.org/model/dictionary/1.0" prefix="d"/>
<!-- Import Alfresco Content Domain Model Definitions -->
<import uri="http://www.alfresco.org/model/content/1.0" prefix="cm"/>
</imports>
<!-- Introduction of new namespaces defined by this model -->
<!-- NOTE: The following namespace custtom.model should be changed to reflect your own namespace -->
<namespaces>
<namespace uri="stt.custtom.model" prefix="stt"/>
</namespaces>
<types>
<type name="stt:statements">
<title>statements</title>
<parent>cm:content</parent>
<properties>
<property name="stt:statementId">
<title>statementId</title>
<description></description>
<type>d:text</type>
<mandatory>false</mandatory>
<multiple>false</multiple>
<index enabled="true">
<tokenised>both</tokenised>
</index>
</property>
</properties>
</type>
</types>
</model>
2. Bootstrap the created models
3. Startup Transform Service . Refer Alfresco-Transform-Service official documentation for setup.
3. Create / update mediation-filter.yml and place it in the directory where you a have alfresco-elastic-search jar files.
In our case stt: statements docType goes under contentNodeTypes tag in yml file which we considered content Indexing is not required.
mediation:
nodeTypes:
contentNodeTypes:
- stt:statements
nodeAspects:
- sys:hidden
fields:
- cmis:changeToken
- alfcmis:nodeRef
- cmis:isImmutable
- cmis:isLatestVersion
- cmis:isMajorVersion
- cmis:isLatestMajorVersion
- cmis:isVersionSeriesCheckedOut
- cmis:versionSeriesCheckedOutBy
- cmis:versionSeriesCheckedOutId
- cmis:checkinComment
- cmis:contentStreamId
- cmis:isPrivateWorkingCopy
- cmis:allowedChildObjectTypeIds
- cmis:sourceId
- cmis:targetId
- cmis:policyText
- trx:password
- pub:publishingEventPayload
Live-indexing
Meditation :
While starting the mediator component we need to pass the location of the updated mediation-filter.yml into the attribute alfresco.mediation.filter-file
Content Indexing and metadata indexing is enabled by default. Refer Alfresco-Live-Indexing-app for more details.
java -jar alfresco-elasticsearch-live-indexing-mediation-x.x.x-app.jar \
--server.port=8081 --spring.activemq.broker-url=tcp://localhost:61616 \
--spring.activemq.user=admin --spring.activemq.password=admin \
--alfresco.path-indexing-component.enabled=false \
--alfresco.accepted-content-media-types-cache.base-url=http://localhost:8090/transform/config \
--alfresco.mediation.filter-file=file:mediation-filter.yml
Content Indexer
java -jar alfresco-elasticsearch-live-indexing-content-x.x.x-app.jar \
--server.port=8083 --spring.activemq.broker-url=tcp://localhost:61616 \
--spring.activemq.user=admin --spring.activemq.password=admin \
--spring.elasticsearch.rest.uris=http://localhost:9200
Metadata Indexer
java -jar alfresco-elasticsearch-live-indexing-metadata-x.x.x-app.jar \
--server.port=8082 \
--spring.activemq.broker-url=tcp://localhost:61616 \
--spring.activemq.user=admin --spring.activemq.password=admin \
--spring.elasticsearch.rest.uris=http://localhost:9200
Reindexing
While starting the mediator component we need to pass the location of the updated mediation-filter.yml into the attribute alfresco.mediation.filter-file
java -jar alfresco-elasticsearch-reindexing-x.x.x-app.jar \
--alfresco.reindex.jobName=reindexByIds \
--spring.elasticsearch.rest.uris=http://localhost:9200 \
--spring.datasource.url=jdbc:postgresql://localhost:5432/alfresco_25.1_0_ES \
--spring.datasource.username=username \
--spring.datasource.password=Password \
--alfresco.reindex.prefixes-file=file:reindex.prefixes-file.json \
--spring.activemq.broker-url=nio://localhost:61616 \
--server.port=9194 --alfresco.reindex.pathIndexingEnabled=false \
--alfresco.mediation.filter-file=file:mediation-filter.yml
Conclusion
By default, metadata and content indexing are enabled across the entire repository during live-indexing or re-indexing, unless explicitly restricted. Therefore, in this way, it enables us to add as many docType and field entries as needed under the relevant tag in mediation-filter.yml, and blacklist the docTypes or/and fields we do not want to index in Elastic Search/OpenSearch.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.