Hyland Connect

angelborroy · ‎02-26-2026

FOSDEM has been one of my fixed points in the year for a long time. It is volunteer-organised, non-commercial, and free to attend. That mix creates something rare: an event where communities self-organise around what they actually build, without vendor noise. FOSDEM is explicit about that mission in the public charter and the official event description.

After FOSDEM 2026, I did something I had never done before: I reviewed the devroom landscape like an infrastructure problem. Not “what talks did I like”, but “what engineering layers have a clear home, and what layers do not”. The conclusion was uncomfortable and very concrete: there is a devroom for AI model plumbing, but no devroom for AI data plumbing.

This post is a public argument for a new FOSDEM devroom proposal for 2027: AI-Ready Data Infrastructure (name is negotiable; the missing layer is not).

What I mean by “AI data infrastructure”

Many people hear “AI at FOSDEM” and immediately think about models: runtimes, GPUs, quantization, compilation, serving stacks. That work is real and valuable. It also has a clear home today.

What I am talking about is the layer that sits between real-world enterprise data and inference: the systems that turn messy content into something usable by retrieval, agents, and LLM applications, under security and compliance constraints.

Content preparation: extraction, OCR, layout preservation, tables, emails, attachments, images, diagrams
Chunking and embedding lifecycle: incremental updates, re-embedding, deletions, provenance, evaluation
Permission-aware retrieval: ACL indexing, security trimming architectures, auditability
Sovereign deployment patterns: on-prem and private deployments for regulated industries
RAG as infrastructure: observability, testing, failure modes, operational reality

In other words: the unglamorous middle where most enterprise AI projects succeed or fail.

Why existing devrooms do not solve it (and why that is nobody fault)

I went through the most relevant devrooms and looked for this layer. Pieces exist, but ownership does not. The existing rooms are optimized for different layers, and their boundaries are reasonable.

AI-focused rooms concentrate on inference infrastructure (model-side plumbing)
Search rooms concentrate on retrieval and ranking (query-side mechanics)
Database rooms concentrate on storage engines and query execution
Collaboration and content rooms concentrate on collaboration tooling, and often avoid AI topics by design

The gap persists because AI data infrastructure sits between these communities. That between space is precisely where a lot of open source engineering is happening right now.

This analysis, with concrete examples from the 2026 landscape and the kinds of topics that did not fit anywhere cleanly, is in the supporting notes I prepared after the event.

Why FOSDEM is the right place to fix this gap

FOSDEM is not only a conference. It is a mechanism: communities propose devrooms, run CFPs, curate schedules, and create a yearly focal point for their layer of the stack.

That mechanism is documented, with clear expectations and key dates. For FOSDEM 2026, the Devroom Managers Manual lists the developer room proposal deadline as 12 October, and the accepted developer rooms announcement as 26 October.

New devrooms appear when a topic becomes mature enough to justify it. One recent example: FOSDEM 2025 welcomed a dedicated eBPF devroom for the first time after years of eBPF talks appearing across other rooms.

AI data infrastructure is at that stage now: too important to be scattered, too cross-layer to be owned by a single existing room.

What this devroom would be (and what it would not be)

This devroom should treat "RAG over real data" as systems engineering, not as prompt craft. The goal is a home for the builders of pipelines, indices, ACL models, evaluation harnesses, and sovereign deployments.

Proposed scope

Document transformation pipelines and extraction quality engineering
Chunking strategies by content type, and embedding lifecycle management
Permission-aware retrieval patterns and ACL indexing models
Hybrid retrieval architectures and evaluation at scale
Sovereign AI patterns: on-prem inference, private deployments, air-gapped constraints
Agentic workflows over private data, with real operational constraints

Non-goals

Pure inference optimisation and GPU plumbing without the data layer story
General "chatbot demos" without pipeline, lifecycle, or security substance
Storage engine internals when the topic is not AI data lifecycle

The real goal of publishing this: make the need visible enough that a coalition forms

A devroom proposal is not a blog post. A devroom proposal is evidence of a community that can deliver a track responsibly. Strong proposals have multiple co-organisers from different projects, plus visible community interest.

I drafted a concrete campaign plan, aligned with the FOSDEM timeline: publish while the event is fresh, engage for a couple of months, form a coalition by early summer, and arrive in October with proof.

Call to action

If you build any part of this missing layer and you have ever felt your FOSDEM talk was homeless because it was not inference, not search ranking, not database internals, and not collaboration tooling, I want to talk.

Document processing and extraction toolchains
Vector databases and embedding infrastructure
Permission-aware retrieval and security trimming
RAG frameworks that focus on production systems
Search engines and content platforms implementing AI ingestion
On-prem and private deployments with real enterprise constraints

I am exploring a FOSDEM 2027 devroom proposal around AI-Ready Data Infrastructure, and I am looking for co-organisers across communities. If you are interested in shaping scope, CFP, or simply validating demand, reach out.

Acknowledgement: devrooms exist because volunteers invest huge effort in them. This post is not a complaint about any individual room. It is a proposal to evolve the devroom map as the open source stack evolves.