Hyland Connect

angelborroy · ‎02-02-2026

Privacy-first collaboration, document transformation, and self-hosted AI: practical takeaways from Europe largest open source conference

Another FOSDEM is in the books. After two days navigating the ULB campus in Brussels, attending sessions across multiple devrooms, and countless hallway conversations, we have distilled the most relevant ideas for anyone building on Alfresco or thinking about AI-ready content infrastructure.

Note that this is not a comprehensive FOSDEM recap, with 65 devrooms and over 1,000 talks, that would be impossible. Instead, I have grouped the sessions I attended by theme and framed each one specifically for our community: What does this mean for content management? How does it connect to Content Lake thinking? What can you actually try?

All links point to official FOSDEM session pages where you will find slides, recordings (when available), and project links.

Privacy-First Collaboration

One clear thread at FOSDEM 2026 was the maturation of self-hosted, privacy-preserving collaboration tools. These are not just alternatives to Google Docs anymore, they are becoming platforms with APIs, embedding capabilities, and federation patterns that content management architects should understand.

CryptPad: Collaboration as an Embeddable Component

Session: CryptPad Updates: Latest in Private Real-Time Collaboration

CryptPad continues to evolve as a privacy-by-design collaboration suite built on zero-knowledge cryptography.

Embedding collaborative editors inside content applications is a recurring requirement (think of existing integrations like OnlyOffice or Collabora). CryptPad approach demonstrates how collaboration can be decoupled from the repository while remaining tightly integrated at the UI and API level.

La Suite numérique

Project: La Suite numérique

While not a single FOSDEM session, La Suite numérique was a recurring reference point in conversations about European digital sovereignty. This is the French government open-source digital workspace, built by DINUM and ANCT, and it is now a collaborative effort with the Netherlands and Germany.

The suite includes:

Docs — Collaborative note-taking, wiki, and documentation platform (15k+ GitHub stars)
Meet — Video conferencing powered by LiveKit
Drive — File sharing and document management
Messages — Collaborative inbox
Conversations — AI chatbot integration

The Docs component is particularly relevant: a modern collaborative wiki that competes directly with commercial alternatives while remaining fully self-hosted.

Nextcloud

Session: Politics in Collaboration? I Don’t Care, Give Me Features

The title says it all. While European digital sovereignty provides context, the real message is that users adopt platforms that work well, regardless of political narratives. The talk focused on feature delivery, UX polish, and integration depth.

Backend capabilities matter, but visible, end-user-oriented features drive adoption. This applies equally to community deployments and enterprise stacks. If your content platform does not feel modern and capable at the UI level, no amount of architectural elegance underneath will save it.

Project Management Meets Content

Content does not exist in isolation. It lives alongside tasks, work packages, and project timelines. Two sessions highlighted how this integration continues to deepen.

Taiga > Tenzu: A Sustainability Story

Session: Taiga, Tenzu, and the Small Story of Sustainability in Open Source

This was not a feature announcement, it was a candid look at how a well-known project evolves, changes ownership, and continues as a community-driven effort. The focus on governance, sustainability, and long-term maintenance was refreshing.

Many Alfresco deployments coexist with project management tools. When evaluating integrations, tool longevity and governance matter as much as features. A tool that might disappear or fork unpredictably is a liability in enterprise content architectures.

OpenProject: Real-Time Collaboration in Structured Work

Session: OpenProject Updates

OpenProject continues adding capabilities, with explicit mention of real-time text collaboration and deeper editor integration.

Tasks, work packages, and documents increasingly overlap. Repository content is becoming part of structured workflows, not just attachments bolted onto tasks. If you are building content-centric applications, consider how work management integration might evolve beyond simple linking.

Document Transformation

Before any LLM can work with your enterprise content, that content needs to be extracted, structured, and normalized. Two sessions addressed this critical and often underestimated layer.

Docling

Session: Get Your Docs in a Row with Docling

Docling focuses on converting PDFs and Word documents into structured, searchable data. It handles complex layouts, tables, code blocks, and formulas; the kind of content that breaks naive extraction tools.

High-quality extraction dramatically improves everything downstream: search relevance, metadata enrichment, chunking for embeddings, and RAG pipeline accuracy. Garbage in, garbage out applies with particular force to AI workloads.

Docling fits naturally as a pre-ingestion or enrichment step. Content enters the repository, gets processed through Docling, and the structured output feeds into vector indexing, classification, or other AI-ready pipelines.

Duckling

Project: Duckling

If you want to evaluate Docling without writing code, Duckling provides a modern graphical interface. Built with React and Flask, it wraps Docling capabilities in a user-friendly web application.

Key features:

Drag-and-drop document upload
Batch processing for multiple files
Multiple export formats: Markdown, HTML, JSON, plain text, and RAG-ready chunks
Image and table extraction with preview
Real-time conversion progress
Docker deployment ready

DocSpec

Session: Document Interoperability and Conversion: It Shouldn’t Be That Hard

DocSpec takes a different angle: document conversion modeled as a specification, producing an AST (abstract syntax tree) and relying on queue-based microservices for processing.

This cleanly separates format parsing from downstream processing. Convert once, reuse everywhere. The queue-based microservice model aligns well with scalable ingestion pipelines where conversion, enrichment, and indexing are independent, horizontally-scalable steps.

Self-Hosted LLM Infrastructure

The most forward-looking sessions I attended dealt with running LLMs in production not as demos or experiments, but as reliable infrastructure. For content platforms handling sensitive data, this is where the action is.

Docker Model Runner

Session: Taming the LLM Zoo with Docker Model Runner

The core idea: treat models as OCI artifacts, distributed and versioned like container images. This decouples model distribution from inference engines (llama.cpp, vLLM, etc.).

Why this matters for enterprise content + AI:

Reproducibility: The same model artifact deploys identically across dev, staging, and production
Traceability: Model versions are tracked like any other artifact
Promotion workflows: Models move through environments with the same rigor as application code
Air-gapped deployments: OCI registries work in disconnected environments

Accelerating vLLM: Quantization and Speculative Decoding

Session: Accelerating vLLM Inference with Quantization and Speculative Decoding

vLLM has become the community-standard open-source engine for LLM inference, and this session provided a practical blueprint for scaling it in production. The speaker from Neural Magic (now part of Red Hat) presented data-backed guidance on two complementary optimization techniques.

Key techniques covered:

Quantization: Reducing model precision to cut memory footprint while preserving quality. The llm-compressor toolkit makes this practical.
Speculative decoding: Using a smaller draft model to accelerate generation. The speculators library provides ready-to-use implementations.

The fact that Red Hat (through Neural Magic) is investing heavily in vLLM optimization signals where enterprise AI infrastructure is heading. These are not academic techniques, they are production-ready optimizations that can dramatically reduce the hardware requirements for self-hosted LLMs.

For content-centric AI, this translates directly to cost and feasibility. If you can run a capable model on fewer GPUs (or smaller GPUs), sovereign AI deployment becomes accessible to more organizations.

Open Source AI in Low-Resource Environments

Session: Building Open and Reproducible AI Practices for LMICs (and Beyond)

This session from the Open Research devroom offered a different perspective: how do you build reproducible AI practices when infrastructure is limited? The speaker, drawing from work with Data Science Without Borders and The Turing Way, outlined approaches for Low and Middle Income Countries (LMICs) that are surprisingly relevant for any resource-constrained environment.

The constraints faced in LMICs (limited compute, unreliable connectivity, need for efficiency) mirror what many organizations face when trying to run AI on-premises. The emphasis on the Open Source AI Definition (OSAID) and its freedom to study principle connects directly to reproducibility and auditability requirements in regulated industries.

Open source AI is not just about avoiding vendor lock-in. It is about being able to study, verify, and reproduce what your AI systems do. For content platforms where AI makes decisions about documents, metadata, or access, this transparency is increasingly a compliance requirement, not just a nice-to-have.

A Year of Self-Hosted LLMs: Lessons Learned

Session: From Infrastructure to Production: A Year of Self-Hosted LLMs

This was a pragmatic, experience-driven talk about what actually works when running LLMs in production. No hype, just operational patterns learned the hard way.

Content-centric AI systems live or die by operational reliability. Your RAG pipeline is only as good as your uptime. The lessons from this session (around GPU management, model loading, request routing, and failure handling) apply directly when embedding LLMs into repository-centric workflows.

A Gap Worth Noting

One observation from navigating FOSDEM this year: the sessions most relevant to Content Lake thinking were scattered across multiple devrooms: Collaboration and Content Management, AI Plumbers, Search, Databases, Open Source Research... There is no single place where preparing enterprise content for AI lives as a topic.

Document transformation, permission-aware retrieval, sovereign LLM deployment, RAG architecture... these topics do not have a natural home yet. It is something I am thinking about for future FOSDEMs.

But that is a topic for another post.

Links and Resources

Privacy-First Collaboration:

CryptPad Updates
Nextcloud: Politics in Collaboration?
La Suite numérique (French government sovereign workspace)

Project Management:

Document Transformation:

Docling
Duckling (UI for Docling with RAG chunking support)
DocSpec

Self-Hosted LLM Infrastructure:

Docker Model Runner
Accelerating vLLM Inference (Red Hat/Neural Magic on quantization and speculative decoding)
Open and Reproducible AI for LMICs (Open Source AI Definition and reproducibility)
From Infrastructure to Production

Were you at FOSDEM 2026? Did I miss a session that Alfresco developers should know about? Let me know in the comments or reach out directly.