Hyland Connect

angelborroy · ‎04-06-2026

This post is a practical guide for Alfresco administrators and platform engineers who need to move beyond the default single-node deployment of the Elasticsearch connector and push more indexing throughput out of it. It covers Docker Compose first and then Kubernetes via the official Helm charts in acs-deployment.

How the live-indexing pipeline works

Before scaling anything, it helps to understand the data flow. Repository events land on an ActiveMQ topic (alfresco.repo.event2). The Mediation service is the only subscriber on that topic; it fans each event out to dedicated queues, one per downstream concern:

ACS repo
   │  (ActiveMQ topic: alfresco.repo.event2)
   ▼
┌─────────────┐
│  Mediation  │  ← durable topic subscriber (single instance)
└──────┬──────┘
       │ fan-out to queues
       ├──────────────────────────────┬──────────────────────────┐
       │                              │                          │
       ▼                              ▼                          ▼
org.alfresco.search             org.alfresco.search      org.alfresco.search
  .metadata.event                 .content.event           .path.event
       │                              │                          │
       ▼                              ▼                          ▼
┌──────────────┐        ┌─────────────────────┐        ┌─────────────┐
│   Metadata   │        │       Content       │        │    Path     │
│  (scalable)  │        │     (scalable)      │        │  (single)   │
└──────────────┘        └─────────────────────┘        └─────────────┘
       │                              │                          │
       └──────────────────────────────┴──────────────────────────┘
                                      │
                               Elasticsearch

What can and cannot scale

Component	Can scale out?	Why
Mediation	No	The channel is configured as `consumer-sjms:topic:alfresco.repo.event2?durableSubscriptionName=LiveIndexingSubscription&clientId=LiveIndexing`. The `clientId` is hard-coded, creating a single named durable JMS subscription. A second instance with a different `clientId` would create a second independent subscription—every event would be delivered to both instances and duplicated into all downstream queues.
Metadata	Yes	Consumes a plain queue with `concurrentConsumers=10`. Multiple instances act as competing consumers. ES update scripts use a `metadataIndexingLastUpdate` timestamp guard that silently no-ops stale writes, so out-of-order delivery across instances is safe.
Content	Yes	All three inbound channels (content events, transform replies, refresh events) use `concurrentConsumers=10`. The transform request embeds the shared reply-queue name in the message payload itself, and replies are decoded entirely from the message body and `clientData` field—there is no instance-local state that would break under parallel consumers.
Path	No	The inbound channel has no `concurrentConsumers`. The project’s own reliability tests explicitly document that path is “single instance with single consume” and that path events must be processed in creation order. The path processor also reads current index state and synchronously rewrites all descendants when a folder is moved, which is not safe under parallel consumers.

The short rule: keep mediation and path at exactly one replica; scale metadata and content.

Docker Compose

The `.env` file

All image versions are kept in a single .env file. Place it next to your compose files before running any command, or pass it explicitly with --env-file. Check quay.io/alfresco for the latest stable tags before deploying.

# .env
LIVE_INDEXING_TAG=5.4.0
LIVE_REINDEXING_TAG=5.4.0

ALFRESCO_TAG=26.1.0-A.22
SHARE_TAG=26.1.0-A.22
POSTGRES_TAG=16.6
TRANSFORM_ROUTER_TAG=4.4.0
TRANSFORM_CORE_AIO_TAG=5.4.0
SHARED_FILE_STORE_TAG=4.4.0
ACTIVE_MQ_TAG=6.2-jre17-rockylinux8
DIGITAL_WORKSPACE_TAG=4.4.1
ACS_NGINX_TAG=3.4.2
ELASTICSEARCH_TAG=8.17.0
KIBANA_TAG=8.17.0

ELASTICSEARCH_INDEX_NAME=alfresco

Step 1: Stop using the all-in-one image

The default docker-compose.yml from the distribution ships the alfresco-elasticsearch-live-indexing image that bundles all four components. You cannot scale individual components out of a bundled image. Create the following override file alongside the main compose file:

# docker-compose.live-indexing-split.yml

services:

  live-indexing-mediation:
    image: quay.io/alfresco/alfresco-elasticsearch-live-indexing-mediation:${LIVE_INDEXING_TAG}
    depends_on:
      - activemq
      - elasticsearch
      - alfresco
      - transform-core-aio
    environment:
      ELASTICSEARCH_INDEXNAME: ${ELASTICSEARCH_INDEX_NAME:-alfresco}
      SPRING_ELASTICSEARCH_REST_URIS: http://elasticsearch:9200
      SPRING_ACTIVEMQ_BROKERURL: nio://activemq:61616
      SPRING_ACTIVEMQ_USER: admin
      SPRING_ACTIVEMQ_PASSWORD: admin
      ALFRESCO_ACCEPTEDCONTENTMEDIATYPESCACHE_BASEURL: http://transform-core-aio:8090/transform/config

  live-indexing-path:
    image: quay.io/alfresco/alfresco-elasticsearch-live-indexing-path:${LIVE_INDEXING_TAG}
    depends_on:
      - activemq
      - elasticsearch
    environment:
      ELASTICSEARCH_INDEXNAME: ${ELASTICSEARCH_INDEX_NAME:-alfresco}
      SPRING_ELASTICSEARCH_REST_URIS: http://elasticsearch:9200
      SPRING_ACTIVEMQ_BROKERURL: nio://activemq:61616
      SPRING_ACTIVEMQ_USER: admin
      SPRING_ACTIVEMQ_PASSWORD: admin

  live-indexing-metadata:
    image: quay.io/alfresco/alfresco-elasticsearch-live-indexing-metadata:${LIVE_INDEXING_TAG}
    depends_on:
      - activemq
      - elasticsearch
    environment:
      ELASTICSEARCH_INDEXNAME: ${ELASTICSEARCH_INDEX_NAME:-alfresco}
      SPRING_ELASTICSEARCH_REST_URIS: http://elasticsearch:9200
      SPRING_ACTIVEMQ_BROKERURL: nio://activemq:61616
      SPRING_ACTIVEMQ_USER: admin
      SPRING_ACTIVEMQ_PASSWORD: admin

  live-indexing-content:
    image: quay.io/alfresco/alfresco-elasticsearch-live-indexing-content:${LIVE_INDEXING_TAG}
    depends_on:
      - activemq
      - elasticsearch
      - shared-file-store
      - transform-core-aio
    environment:
      ELASTICSEARCH_INDEXNAME: ${ELASTICSEARCH_INDEX_NAME:-alfresco}
      SPRING_ELASTICSEARCH_REST_URIS: http://elasticsearch:9200
      SPRING_ACTIVEMQ_BROKERURL: nio://activemq:61616
      SPRING_ACTIVEMQ_USER: admin
      SPRING_ACTIVEMQ_PASSWORD: admin
      ALFRESCO_SHAREDFILESTORE_BASEURL: http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file/
      ALFRESCO_ACCEPTEDCONTENTMEDIATYPESCACHE_BASEURL: http://transform-core-aio:8090/transform/config

Step 2: Start the stack

Run both files together and suppress the bundled all-in-one service by scaling it to zero:

docker compose \
  -f docker-compose.yml \
  -f docker-compose.live-indexing-split.yml \
  --env-file .env \
  up -d \
  --scale live-indexing=0

Step 3: Scale metadata and content replicas

docker compose \
  -f docker-compose.yml \
  -f docker-compose.live-indexing-split.yml \
  --env-file .env \
  up -d \
  --scale live-indexing=0 \
  --scale live-indexing-metadata=2 \
  --scale live-indexing-content=2

A sensible starting configuration for a busy repository:

Service	Replicas
`live-indexing-mediation`	1 (fixed)
`live-indexing-path`	1 (fixed)
`live-indexing-metadata`	2
`live-indexing-content`	2

Step 4: Tune concurrent consumers per instance

Each metadata and content instance opens up to 10 consumer threads against its queue by default. That is the concurrentConsumers=10 parameter baked into the Camel channel URI:

# metadata.properties
in.alfresco.metadata.event.channel=consumer-sjms:org.alfresco.search.metadata.event?concurrentConsumers=10

# content.properties
in.alfresco.content.event.channel=consumer-sjms:org.alfresco.search.content.event?concurrentConsumers=10
in.alfresco.content.availability.channel=consumer-sjms:org.alfresco.search.contentstore.event?concurrentConsumers=10
in.alfresco.content.refresh.event.channel=consumer-sjms:org.alfresco.search.contentrefresh.event?concurrentConsumers=10

Because concurrentConsumers is embedded in the URI string rather than a standalone property, you override it by redefining the whole channel via a Spring environment variable. Add this to the relevant service environment block:

# Raise metadata consumers to 20 threads per instance
live-indexing-metadata:
  environment:
    IN_ALFRESCO_METADATA_EVENT_CHANNEL: >-
      consumer-sjms:org.alfresco.search.metadata.event?concurrentConsumers=20

# Raise content consumers to 20 threads per instance
live-indexing-content:
  environment:
    IN_ALFRESCO_CONTENT_EVENT_CHANNEL: >-
      consumer-sjms:org.alfresco.search.content.event?concurrentConsumers=20
    IN_ALFRESCO_CONTENT_AVAILABILITY_CHANNEL: >-
      consumer-sjms:org.alfresco.search.contentstore.event?concurrentConsumers=20
    IN_ALFRESCO_CONTENT_REFRESH_EVENT_CHANNEL: >-
      consumer-sjms:org.alfresco.search.contentrefresh.event?concurrentConsumers=20

Add replicas first; only raise concurrentConsumers when extra replicas alone stop reducing queue depth.

Step 5: Scale ATS and SFS alongside content

More content consumers only move the bottleneck downstream to the Transform Service (ATS) and the Shared File Store (SFS). When the acs-repo-transform-request queue starts growing, scale transform-core-aio alongside content:

docker compose \
  -f docker-compose.yml \
  -f docker-compose.live-indexing-split.yml \
  --env-file .env \
  up -d \
  --scale live-indexing=0 \
  --scale live-indexing-metadata=2 \
  --scale live-indexing-content=3 \
  --scale transform-core-aio=2

Note on SFS in Compose: The distribution mounts SFS on a named tmpfs volume on a single Docker host. Multiple SFS containers sharing that volume on the same host works, but this is not a distributed filesystem; it is only viable for single-host Compose deployments. For multi-host setups, use Kubernetes with a ReadWriteMany storage class.

When to scale in Compose

Open the ActiveMQ web console at http://localhost:8161 (default credentials admin/admin) and navigate to Queues. Scale the component whose queue is growing:

Queue	Growing means	Scale this
`org.alfresco.search.metadata.event`	Metadata consumers are behind	`--scale live-indexing-metadata=N` and/or raise `concurrentConsumers`
`org.alfresco.search.content.event`	Content consumers are behind	`--scale live-indexing-content=N` and/or raise `concurrentConsumers`
`acs-repo-transform-request`	Transform Service is the bottleneck	`--scale transform-core-aio=N`
`org.alfresco.search.contentstore.event`	SFS reads or content indexing backlogged	Scale content replicas; review SFS if latency is high
`org.alfresco.search.contentrefresh.event`	Transform retries piling up	Scale `transform-core-aio` and check SFS throughput
`org.alfresco.search.path.event`	Path processor is behind	Cannot scale. Check CPU/memory headroom on the path container, or investigate whether a bulk folder-move triggered the spike.

Kubernetes with Helm

The acs-deployment repo provides a production-ready Helm chart (alfresco-content-services) that already ships all four split live-indexing services via the alfresco-search-enterprise subchart. There is no all-in-one image in the Helm deployment, the split is already the default.

Chart structure

alfresco-content-services/
├── values.yaml
└── subcharts
    ├── alfresco-search-enterprise   ← mediation, metadata, content, path
    ├── alfresco-transform-service   ← transform router, renderers, SFS
    ├── elastic                      ← Elasticsearch
    └── activemq                     ← ActiveMQ broker

Within alfresco-search-enterprise, each live-indexing service maps to a Kubernetes resource:

Component	Kind	Default replicas
Mediation	StatefulSet	1
Metadata	Deployment	1
Content	Deployment	1
Path	Deployment	1

Mediation is a StatefulSet (not a Deployment) because its durable JMS subscription is tied to a stable pod identity. Do not change its replica count.

Scaling metadata and content

Override replicaCount for the scalable services in your own values file:

# my-values.yaml
alfresco-search-enterprise:
  liveIndexing:
    metadata:
      replicaCount: 3
    content:
      replicaCount: 3
    # mediation and path intentionally omitted — keep at 1

Apply with:

helm upgrade --install acs alfresco/alfresco-content-services \
  -f my-values.yaml \
  --namespace alfresco

No HPA for live-indexing services: The upstream chart does not define a HorizontalPodAutoscaler for any live-indexing component. Scaling is manual via replicaCount. If you want to automate it based on ActiveMQ queue depth, KEDA with an ActiveMQ trigger is the right tool, but that is outside the scope of the upstream chart current practices.

Raising concurrent consumers per pod

Same mechanism as Compose: override the full channel URI via the service environment block in values:

alfresco-search-enterprise:
  liveIndexing:
    metadata:
      replicaCount: 3
      environment:
        IN_ALFRESCO_METADATA_EVENT_CHANNEL: >-
          consumer-sjms:org.alfresco.search.metadata.event?concurrentConsumers=20
    content:
      replicaCount: 3
      environment:
        IN_ALFRESCO_CONTENT_EVENT_CHANNEL: >-
          consumer-sjms:org.alfresco.search.content.event?concurrentConsumers=20
        IN_ALFRESCO_CONTENT_AVAILABILITY_CHANNEL: >-
          consumer-sjms:org.alfresco.search.contentstore.event?concurrentConsumers=20
        IN_ALFRESCO_CONTENT_REFRESH_EVENT_CHANNEL: >-
          consumer-sjms:org.alfresco.search.contentrefresh.event?concurrentConsumers=20

Default resource budgets

The chart ships conservative defaults. Review these against your node capacity before raising replica counts:

# alfresco-search-enterprise defaults (applies to all four live-indexing pods)
resources:
  requests:
    cpu: "0.5"
    memory: 256Mi
  limits:
    cpu: "2"
    memory: 2048Mi

With three metadata replicas and three content replicas you are requesting 3 CPU and 1.5 Gi just for those six pods. Tune requests to match observed usage before committing to a replica count.

Scaling ATS (Transform Service)

The upstream chart already defines HPA for every transform worker (pdfrenderer, imagemagick, libreoffice, tika, transformmisc) with this default policy:

# defaults in alfresco-transform-service
autoscaling:
  enabled: true
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 75

The Transform Router defaults to 2 replicas. If you are scaling content consumers aggressively, raise the HPA ceiling or the router replica count:

alfresco-transform-service:
  transformrouter:
    replicaCount: 3
  pdfrenderer:
    autoscaling:
      maxReplicas: 6
  tika:
    autoscaling:
      maxReplicas: 6
  libreoffice:
    autoscaling:
      maxReplicas: 6

Scaling SFS (Shared File Store)

SFS defaults to 1 replica with a Recreate deployment strategy and a ReadWriteOnce PVC. To run more than one replica you must provide a storage class that supports ReadWriteMany (NFS, Azure Files, Amazon EFS, etc.):

alfresco-transform-service:
  filestore:
    replicaCount: 2
    persistence:
      accessModes:
        - ReadWriteMany
      storageClass: "your-rwx-storage-class"

Without ReadWriteMany, a second SFS pod will fail to mount the same PVC. In practice, a single well-resourced SFS pod is rarely the bottleneck; start by scaling transform workers and revisit SFS only if org.alfresco.search.contentstore.event continues to grow after ATS is adequately scaled.

When to scale in Kubernetes

Reach the ActiveMQ web console with a port-forward:

kubectl port-forward svc/alfresco-activemq 8161:8161 -n alfresco
# then open http://localhost:8161 → Queues

Use the same decision table as for Compose:

Queue	Growing → scale this
`org.alfresco.search.metadata.event`	`liveIndexing.metadata.replicaCount`
`org.alfresco.search.content.event`	`liveIndexing.content.replicaCount`
`acs-repo-transform-request`	ATS worker `autoscaling.maxReplicas`
`org.alfresco.search.contentstore.event`	Content replicas and/or SFS
`org.alfresco.search.contentrefresh.event`	ATS and SFS
`org.alfresco.search.path.event`	Cannot scale. Check pod resources.

For automated alerting, scrape queue-depth metrics through the ActiveMQ Prometheus exporter and alert when any non-path queue depth exceeds your SLA threshold for a sustained period (e.g., >1,000 messages for >5 minutes).

Summary: rules at a glance

mediation  → always 1 replica  (StatefulSet in k8s)
path       → always 1 replica
metadata   → start at 2; watch org.alfresco.search.metadata.event
content    → start at 2; watch org.alfresco.search.content.event
ATS        → HPA already handles this in Helm; raise maxReplicas if acs-repo-transform-request grows
SFS        → start at 1; needs ReadWriteMany PVC to go beyond 1 replica

Scale in this order when throughput is insufficient:

Add a replica to metadata or content (whichever queue is growing).
If queue depth still does not clear, raise concurrentConsumers on that service.
If acs-repo-transform-request or org.alfresco.search.contentstore.event then grows, scale ATS workers.
Only then consider scaling SFS, and only with a ReadWriteMany storage class.
If org.alfresco.search.path.event is growing, scaling is not the answer: investigate whether the path processor has enough CPU/memory, or whether a bulk folder operation triggered a temporary spike that will self-resolve.