OpenSearchCon Europe 2026 was relevant for teams building secure enterprise search and AI features on top of Alfresco, Nuxeo, and a shared Content Lake. The event was not just about generic vector search. Several sessions went directly into the problems we actually have in content platforms: permission-aware retrieval, multi-tenant collaboration, document parsing, chunking, embeddings, and operating a self-managed search stack without handing core data flows to external services.
This post summarizes the sessions that matter most for Alfresco and Nuxeo developers and maps them to practical design decisions. The emphasis is on what these talks mean for repository-backed content systems rather than on conference recap.
The Hyland deck used at the event framed the platform around a few concrete capabilities: AFTS, CMIS, content metadata, path-aware search, permissions, permission-aware retrieval, indexing, chunking, multi-tenancy, and AI-ready search. That is exactly the boundary where OpenSearch becomes more than a search engine:
The most important operational session for repository-backed systems was "Upcoming Changes for the OpenSearch Index Authorization Mechanisms" by Nils Bandener.
The core problem described in the talk is familiar: users rarely think in terms of physical OpenSearch indexes, but security still has to be enforced at index resolution time. Historically, the do_not_fail_on_forbidden behavior tried to make this easier by dropping unauthorized indexes from requests. That made dashboards and broad search requests more usable, but it also introduced ambiguity and inconsistent semantics.
The talk outlined a replacement model that extends existing request options such as ignore_unavailable and allow_no_indices into security-aware index resolution. In practice, this means:
403 ForbiddenFor Alfresco and Nuxeo developers, the main takeaway is simple: if your content is partitioned across tenants, repositories, or derived indexes, index naming and index resolution are part of your security model. They are not just operational details.
The other major change in the talk was alias semantics. Under the new model, aliases must be granted privileges directly. OpenSearch should no longer silently break an alias into its backing indexes during evaluation. That matters for enterprise search designs where aliases are used to represent business collections, tenant views, or lifecycle-driven index groups. If we rely on aliases to express repository-facing abstractions, privilege assignment has to follow that abstraction as well.
Two sources covered this theme: the conference session "Enabling Secure, Fine-Grained Resource Sharing for Team Collaboration in OpenSearch" and the OpenSearch blog post "Introducing resource sharing: A new access control model for OpenSearch".
This topic is easy to misunderstand. Resource sharing is not a replacement for document-level security on Alfresco or Nuxeo content. It is a security model for higher-level OpenSearch resources created by plugins, such as ML models, anomaly detectors, reports, and other platform objects.
The model introduced in OpenSearch 3.3 adds:
For repository and Content Lake work, this matters in two places.
First, it gives a cleaner control-plane model for shared AI and analytics assets. If multiple teams reuse the same embedding model, search workflow, or monitoring resource, that sharing should not depend on broad backend-role leakage. It should be explicit.
Second, it draws a useful boundary: content authorization and platform-resource authorization are different problems. Repository ACLs determine who can see a document. Resource sharing determines who can use the OpenSearch-side artifacts that process, analyze, or present that content.
That separation is healthy. It avoids the common mistake of collapsing every access rule into one oversized role model.
The most directly useful AI-ingestion session was "Integrating Docling With OpenSearch for Advanced RAG and Agentic Applications" by Cesar Berrospi Ramis.
The talk made the case that enterprise RAG fails early when document conversion is weak. PDFs are especially problematic because plain text extraction loses structure, page reading order, table meaning, image meaning, and section boundaries. Once that structure is lost, chunking quality drops, citations degrade, and hallucinations become more likely because the retrieval layer is feeding the model corrupted context.
Docling's contribution is a richer document-conversion pipeline:
DoclingDocument representationThis is highly relevant to Alfresco and Nuxeo because a large share of enterprise content is not clean HTML or normalized JSON. It is scanned PDF, office documents, compound documents, and records with structure hidden in layout. If the Content Lake wants to be AI-ready, the ingestion layer must preserve that structure before chunking starts.
The talk also showed a practical target architecture:
One especially important point for enterprise deployments is locality. Docling can run locally, which is much easier to justify for regulated content than shipping raw documents to external parsing services. That aligns well with Alfresco and Nuxeo installations where data residency and controlled processing paths are hard requirements.
Alfresco Transform Services and DocFilters provide equivalent features to Docling and are highly integrated with Hyland projects
The session "Building RAG With OpenSearch ML Plugins: From PDFs To Voice-Enabled Search" by Kushagra Sharma covered a more end-to-end implementation path.
The material was straightforward and practical:
knn_vector indexThe strongest point for enterprise developers was not voice search. It was containment. The talk argued for a self-contained system where embedding generation and retrieval stay inside the OpenSearch environment rather than depending on an external embedding API for every request.
The talk used simple chunking values such as 500-word chunks with 50-word overlap. That is a good starting point, but repository content usually needs smarter boundaries than plain word counts. For real content systems, chunking should respect:
That is where the Docling talk and the ML Commons talk fit together: better parsing should drive better chunking, and better chunking should drive better retrieval.
The session "From Embeddings To Index: A Practitioner's Guide To Domain-Adapted Neural Search" by Samuel Herman was one of the most relevant titles on the schedule, but its slide deck was not part of the local downloaded materials. The summary in this section is therefore an informed interpretation based on the session title, the official event schedule, OpenSearch neural-search documentation, and Samuel Herman public repository and publication profile.
The important idea is that enterprise retrieval does not fail because vectors exist. It fails because generic embeddings are often too generic for the domain being searched.
In other words, producing embeddings is only the start. The hard part is how those embeddings are aligned with the domain, indexed, retrieved, and combined with metadata and keyword signals.
For teams building a Content Lake, this usually points toward:
If I had to compress the lesson for repository developers into one sentence, it would be this: vector search is not a feature you "switch on"; it is a retrieval design exercise that starts with your domain model and security model.
Taken together, these talks describe a coherent direction for enterprise content search.
Do not reduce ingestion to binary extraction plus plain text. Repository documents should go through a conversion stage that preserves layout, tables, headings, and provenance whenever possible. That is the baseline for better chunking and better citations.
Chunk boundaries should reflect document structure and repository semantics. A chunk should remain attributable to a source node, page or section, tenant, and ACL context.
Dense retrieval is useful, but enterprise search still benefits from metadata filters, keyword matching, exact identifiers, path constraints, and permissions. The practical target is hybrid retrieval, not a vector-only stack.
There is document access and there is platform resource access. Repository ACLs, index authorization, and alias semantics belong to the first category. Sharing dashboards, models, and detectors belongs to the second. Mixing them carelessly will produce operational pain.
Most of the quality gains described in these talks happen before answer generation: parsing, enrichment, chunking, indexing, filtering, and retrieval. That is good news for repository teams because these are controllable engineering concerns.
Thanks to the OpenSearchCon Europe organizers, the OpenSearch Software Foundation, and the Linux Foundation events team for putting together a conference that was genuinely useful for practitioners. It was a strong event for teams working on real enterprise search and AI problems, and we appreciated the chance to participate, learn, and represent Hyland in that conversation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.