AI isn’t failing because models aren’t good enough. It’s failing because the data isn’t.
Today, we’re launching Knowledge Enrichment, a breakthrough API that transforms fragmented, unstructured enterprise content into deeply structured, semantically rich, AI-ready data. Powered by Hyland’s proven Document Filters and AI. It delivers what developers and architects have been missing, clean, contextual, vectorized output across 600+ file types, ready for direct integration into your AI stack.
If you're building AI Agents, retrieval-augmented generation (RAG) systems, or enterprise LLMs, stop what you’re doing. This changes everything.
Large language models are powerful, but they work best when the content they see is accurate, complete, and grounded in the original source.
That’s not how most enterprise content looks. It’s buried in PDFs, spreadsheets, presentations, and proprietary formats. It lives across ECMs, network drives, and cloud repositories. And even when you extract it, traditional methods often strip away layout, context, and structure, leaving behind only raw text.
Some teams try to solve this by prompting LLMs to infer structure, meaning, or metadata from that text. But LLMs don’t actually know what was in the file, they’re guessing. Their outputs are based on pattern recognition and probability, not ground truth.
Knowledge Enrichment does the opposite. It pulls structure, semantics, and content directly from the file itself using deterministic extraction, not AI hallucination. Backed by Hyland Document Filters, it knows what was a table, what was a heading, what was a footer, and what content belonged together, with no inference required.
The result is structured outputs you can trust, and clean, contextual content your LLMs don’t have to guess about.
Knowledge Enrichment is built on two core components:
At the foundation of Knowledge Enrichment is Hyland Document Filters, the same high-performance SDK trusted by leading security tools, compliance platforms, and search products. It processes over 600 file formats across documents, emails, CAD files, and more, with zero reliance on external services or cloud APIs.
Document Filters doesn’t just extract text. It:
This is the raw substrate Knowledge Enrichment builds on; a consistent, portable, and accurate foundation for AI structuring.
On top of that, Knowledge Enrichment performs deep analysis and transforms content into structured outputs using AI where needed:
| Capability | Description |
| Entity Extraction | Identifies named entities (people, orgs, places) with labeling and context |
| Table Structuring | Extracts and normalizes tables with headers, rows, and data types |
| Contextual Chunking | Segments content into meaningfully grouped sections based on structure and semantics |
| Summarization | Generates document-level summaries |
| Classification | Tags documents by type, topic, or intent using AI classifiers |
| Contextual Metadata | Derives rich metadata by interpreting document meaning, layout, and semantics, far beyond basic file properties |
| Embeddings Generation | Creates vector representations for content, enabling RAG and clustering |
These outputs are returned in Markdown or JSON formats and are ready for direct ingestion into:
Scanned documents, handwritten notes, and damage photos are all enriched into structured JSON with extracted metadata (claim number, policy ID, repair estimate) and chunked content for routing or summarization. LLMs consume the cleaned data to assist adjusters or validate claims.
Contracts, pleadings, and emails are parsed into structured sections with semantic labeling and summaries. The output can be ingested into eDiscovery tools or used to fine-tune LLMs that assist with clause comparison or timeline construction.
Annual reports, prospectuses, and policies are vectorized and embedded with layout-preserved markup. RAG agents can then retrieve precisely the right section for question answering or summarization with full traceability back to the source.
Knowledge Enrichment is designed as a developer-first API. Every feature is exposed via simple REST endpoints.
curl -L 'https://knowledge-enrichment.ai.experience.hyland.com/latest/api/data-curation/presign' \
-H 'Content-Type: application/json' \
-H 'Accept: text/json' \
-H 'Authorization: Bearer <token>' \
-d '{
"normalization": {
"quotations": true
},
"chunking": true,
"embedding": true,
}'The response contains structured content in Markdown, as well as contextually chunked content and embeddings. Each element is aligned by page and coordinate so you can trace insights back to the original document with full fidelity. See more information on this endpoint in the documentation.
Note: The above UI was created to showcase the capabilities of Knowledge Enrichment and is not part of the Knowledge Enrichment product.
curl -L 'https://cin-context-api.experience.hyland.com/context/api/content/process' \
-H 'Content-Type: application/json' \
-H 'Accept: text/plain' \
-H 'Authorization: Bearer <token>' \
-d '{
"objectKeys": [
"string"
],
"actions": [
"image-description, image-metadata-generation"
],
"kSimilarMetadata": [
{
"estimate_details": {
"job_number": "R-2024-0568",
"creation_date": "2024-06-15",
"expiration_date": "2024-07-15",
"estimate_total": "8,750.00",
"status": "pending"
},
"property": {
"address": "123 Main Street",
"city": "Springfield",
"state": "IL",
"zip": "62701",
"year_built": "1995",
"roof_size_sqft": "2,400"
},
"damage_assessment": {
"damage_cause": "hail_storm",
"date_of_damage": "2024-05-20",
"affected_areas": "southwest_slope|ridge_caps|flashing",
"severity": "moderate",
"roof_condition": "significantly damaged",
"potential_cause": "recent storm or wind event",
"damage_types": "water intrusion|shingle displacement|structural exposure",
"additional_risk": "potential hidden damage in surrounding roof areas",
"urgency_level": "moderate to high"
},
"repair_scope": {
"materials": "asphalt_shingles|underlayment|flashing",
"warranty_period": "15 years",
"estimated_completion_time": "3 days",
"repair_recommendations": [
"replace damaged shingles",
"inspect and repair flashing",
"clean gutters"
]
}
}
]
}'The response contains a JSON representation of the information we asked for from the document, including a summary and additional metadata interpreted from the document. See more information on this endpoint in the documentation.
Note: The above UI was created to showcase the capabilities of Knowledge Enrichment and is not part of the Knowledge Enrichment product.
Whether you’re building an LLM-powered copilot or modernizing legacy document automation, Knowledge Enrichment unlocks new possibilities.
Knowledge Enrichment doesn't require content centralization to deliver value. It's built to operate across:
Its lightweight API model allows you to point it at content wherever it lives, without migration.
Enterprise content is messy. That’s why Knowledge Enrichment inherits the full breadth of file support from Document Filters, including:
Using other services outside of Document Filters, Knowledge Enrichment is able to support additional formats, including:
Each format is parsed with layout, structure, and content zones preserved, giving AI systems a richer, more accurate context to reason with.
As enterprise AI shifts from model development to agent deployment, the need for structured, explainable, and semantically meaningful content becomes critical. Knowledge Enrichment creates the substrate that enables agents to:
Whether you’re building a smart assistant for internal operations or an external-facing customer support AI, the difference between helpful and harmful will be the data layer underneath.
Knowledge Enrichment is available now as part of the Content Innovation Cloud.
The future of enterprise AI won’t be defined by prompts, it’ll be defined by the data you feed the models. With Knowledge Enrichment, you finally have the ability to deliver content that’s not just extracted, but structured, enriched, and ready for reasoning.
Let your AI start smarter. Let your content speak with context.
Explore the full developer documentation or express interest in getting access.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.