Privacy-first collaboration, document transformation, and self-hosted AI: practical takeaways from Europe largest open source conference
Another FOSDEM is in the books. After two days navigating the ULB campus in Brussels, attending sessions across multiple devrooms, and countless hallway conversations, we have distilled the most relevant ideas for anyone building on Alfresco or thinking about AI-ready content infrastructure.
Note that this is not a comprehensive FOSDEM recap, with 65 devrooms and over 1,000 talks, that would be impossible. Instead, I have grouped the sessions I attended by theme and framed each one specifically for our community: What does this mean for content management? How does it connect to Content Lake thinking? What can you actually try?
All links point to official FOSDEM session pages where you will find slides, recordings (when available), and project links.
One clear thread at FOSDEM 2026 was the maturation of self-hosted, privacy-preserving collaboration tools. These are not just alternatives to Google Docs anymore, they are becoming platforms with APIs, embedding capabilities, and federation patterns that content management architects should understand.
Session: CryptPad Updates: Latest in Private Real-Time Collaboration
CryptPad continues to evolve as a privacy-by-design collaboration suite built on zero-knowledge cryptography.
Embedding collaborative editors inside content applications is a recurring requirement (think of existing integrations like OnlyOffice or Collabora). CryptPad approach demonstrates how collaboration can be decoupled from the repository while remaining tightly integrated at the UI and API level.
Project: La Suite numérique
While not a single FOSDEM session, La Suite numérique was a recurring reference point in conversations about European digital sovereignty. This is the French government open-source digital workspace, built by DINUM and ANCT, and it is now a collaborative effort with the Netherlands and Germany.
The suite includes:
The Docs component is particularly relevant: a modern collaborative wiki that competes directly with commercial alternatives while remaining fully self-hosted.
Session: Politics in Collaboration? I Don’t Care, Give Me Features
The title says it all. While European digital sovereignty provides context, the real message is that users adopt platforms that work well, regardless of political narratives. The talk focused on feature delivery, UX polish, and integration depth.
Backend capabilities matter, but visible, end-user-oriented features drive adoption. This applies equally to community deployments and enterprise stacks. If your content platform does not feel modern and capable at the UI level, no amount of architectural elegance underneath will save it.
Content does not exist in isolation. It lives alongside tasks, work packages, and project timelines. Two sessions highlighted how this integration continues to deepen.
Session: Taiga, Tenzu, and the Small Story of Sustainability in Open Source
This was not a feature announcement, it was a candid look at how a well-known project evolves, changes ownership, and continues as a community-driven effort. The focus on governance, sustainability, and long-term maintenance was refreshing.
Many Alfresco deployments coexist with project management tools. When evaluating integrations, tool longevity and governance matter as much as features. A tool that might disappear or fork unpredictably is a liability in enterprise content architectures.
Session: OpenProject Updates
OpenProject continues adding capabilities, with explicit mention of real-time text collaboration and deeper editor integration.
Tasks, work packages, and documents increasingly overlap. Repository content is becoming part of structured workflows, not just attachments bolted onto tasks. If you are building content-centric applications, consider how work management integration might evolve beyond simple linking.
Before any LLM can work with your enterprise content, that content needs to be extracted, structured, and normalized. Two sessions addressed this critical and often underestimated layer.
Session: Get Your Docs in a Row with Docling
Docling focuses on converting PDFs and Word documents into structured, searchable data. It handles complex layouts, tables, code blocks, and formulas; the kind of content that breaks naive extraction tools.
High-quality extraction dramatically improves everything downstream: search relevance, metadata enrichment, chunking for embeddings, and RAG pipeline accuracy. Garbage in, garbage out applies with particular force to AI workloads.
Docling fits naturally as a pre-ingestion or enrichment step. Content enters the repository, gets processed through Docling, and the structured output feeds into vector indexing, classification, or other AI-ready pipelines.
Project: Duckling
If you want to evaluate Docling without writing code, Duckling provides a modern graphical interface. Built with React and Flask, it wraps Docling capabilities in a user-friendly web application.
Key features:
Session: Document Interoperability and Conversion: It Shouldn’t Be That Hard
DocSpec takes a different angle: document conversion modeled as a specification, producing an AST (abstract syntax tree) and relying on queue-based microservices for processing.
This cleanly separates format parsing from downstream processing. Convert once, reuse everywhere. The queue-based microservice model aligns well with scalable ingestion pipelines where conversion, enrichment, and indexing are independent, horizontally-scalable steps.
The most forward-looking sessions I attended dealt with running LLMs in production not as demos or experiments, but as reliable infrastructure. For content platforms handling sensitive data, this is where the action is.
Session: Taming the LLM Zoo with Docker Model Runner
The core idea: treat models as OCI artifacts, distributed and versioned like container images. This decouples model distribution from inference engines (llama.cpp, vLLM, etc.).
Why this matters for enterprise content + AI:
Session: Accelerating vLLM Inference with Quantization and Speculative Decoding
vLLM has become the community-standard open-source engine for LLM inference, and this session provided a practical blueprint for scaling it in production. The speaker from Neural Magic (now part of Red Hat) presented data-backed guidance on two complementary optimization techniques.
Key techniques covered:
The fact that Red Hat (through Neural Magic) is investing heavily in vLLM optimization signals where enterprise AI infrastructure is heading. These are not academic techniques, they are production-ready optimizations that can dramatically reduce the hardware requirements for self-hosted LLMs.
For content-centric AI, this translates directly to cost and feasibility. If you can run a capable model on fewer GPUs (or smaller GPUs), sovereign AI deployment becomes accessible to more organizations.
Session: Building Open and Reproducible AI Practices for LMICs (and Beyond)
This session from the Open Research devroom offered a different perspective: how do you build reproducible AI practices when infrastructure is limited? The speaker, drawing from work with Data Science Without Borders and The Turing Way, outlined approaches for Low and Middle Income Countries (LMICs) that are surprisingly relevant for any resource-constrained environment.
The constraints faced in LMICs (limited compute, unreliable connectivity, need for efficiency) mirror what many organizations face when trying to run AI on-premises. The emphasis on the Open Source AI Definition (OSAID) and its freedom to study principle connects directly to reproducibility and auditability requirements in regulated industries.
Open source AI is not just about avoiding vendor lock-in. It is about being able to study, verify, and reproduce what your AI systems do. For content platforms where AI makes decisions about documents, metadata, or access, this transparency is increasingly a compliance requirement, not just a nice-to-have.
Session: From Infrastructure to Production: A Year of Self-Hosted LLMs
This was a pragmatic, experience-driven talk about what actually works when running LLMs in production. No hype, just operational patterns learned the hard way.
Content-centric AI systems live or die by operational reliability. Your RAG pipeline is only as good as your uptime. The lessons from this session (around GPU management, model loading, request routing, and failure handling) apply directly when embedding LLMs into repository-centric workflows.
One observation from navigating FOSDEM this year: the sessions most relevant to Content Lake thinking were scattered across multiple devrooms: Collaboration and Content Management, AI Plumbers, Search, Databases, Open Source Research... There is no single place where preparing enterprise content for AI lives as a topic.
Document transformation, permission-aware retrieval, sovereign LLM deployment, RAG architecture... these topics do not have a natural home yet. It is something I am thinking about for future FOSDEMs.
But that is a topic for another post.
Privacy-First Collaboration:
Project Management:
Document Transformation:
Self-Hosted LLM Infrastructure:
Were you at FOSDEM 2026? Did I miss a session that Alfresco developers should know about? Let me know in the comments or reach out directly.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.