Bedrock Knowledge Bases.
Yesterday's Evaluations tip was about measuring an agent. Today is about feeding it. Amazon Bedrock Knowledge Bases is AWS's managed Retrieval-Augmented Generation (RAG) service: point it at your data, pick an embedding model and a vector store, and it runs the parse → chunk → embed → index pipeline and the query-time retrieval — so your app gets relevant context, with citations, instead of a model guessing from training data alone.
aws bedrock-agent-runtime retrieve-and-generate … — retrieve + generate, with citations
01What Knowledge Bases actually is
From the overview: RAG "uses information from data sources to improve the relevancy and accuracy of generated responses." Knowledge Bases is AWS's out-of-the-box implementation — it "abstracts from the heavy lifting of building pipelines" so you don't hand-roll the ingestion and retrieval layer, and it removes "the need to continually train your model to be able to use your private data."
Standing one up is four steps:
- (Optional) Provision a vector store — or let the console create an Amazon OpenSearch Serverless store for you.
- Connect a data source — unstructured (S3, web crawler, connectors) or structured.
- Sync — the data source is parsed, chunked, embedded, and indexed.
- Query — return raw sources, generate an answer with citations, or transform the question into a structured query such as SQL.
You stop owning the RAG plumbing — chunkers, embedding jobs, a vector index, citation tracking — and start owning two decisions: how to chunk, and which retrieval API to call.
02The ingestion pipeline: parse → chunk → embed → index
For unstructured data, ingestion converts each document to text, splits it into chunks, converts each chunk into a vector embedding, and writes those embeddings to a vector index "while maintaining a mapping to the original document." Those vectors are what make semantic search work: at query time the user's question is embedded too, and the index returns the chunks whose vectors sit closest to it.
Before chunking, you pick a parser. The default parser reads text; for documents heavy with tables, figures, or scanned pages you can use a foundation-model parser — the docs list the Claude vision, Nova vision, and Llama 4 vision model families — or the Bedrock Data Automation (BDA) parser (in preview, US West (Oregon) only, subject to change).
03Four chunking strategies
Chunking is the single most consequential ingestion choice. Bedrock supports four text strategies:
| Strategy | What it does |
|---|---|
| Fixed-size | You set a maximum tokens-per-chunk and an overlap percentage between consecutive chunks. |
| Default | Splits into chunks of "approximately 300 tokens," honoring sentence boundaries so complete sentences stay intact. |
No chunking (NONE) |
Each document becomes a single chunk — pre-split your files first. You lose page numbers in citations and the x-amz-bedrock-kb-document-page-number metadata filter. |
| Hierarchical | Nested parent/child chunks. Retrieval pulls precise child chunks, then "replaces them with broader parent chunks" for more context. You set parent size, child size, and overlap tokens. |
| Semantic | Splits on meaning, not syntax. Three knobs: max tokens, buffer size (surrounding sentences embedded together), and a breakpoint percentile threshold (higher = fewer, larger chunks). |
Multimodal content follows different rules: with Nova multimodal embeddings, chunking happens at the embedding-model level — audio and video chunk duration is configurable from 1–30 seconds (default 5) — and the text strategies above apply only to text documents. The BDA parser instead converts audio/video to transcripts and scene summaries first, then applies the text strategies.
04Where the vectors live
Knowledge Bases indexes into a range of vector stores. Either let the console spin up an OpenSearch Serverless collection, or bring your own:
- Amazon OpenSearch Serverless — the console can auto-create it.
- Amazon OpenSearch Managed Clusters.
- Amazon S3 Vectors — cost-optimized vector storage for RAG.
- Amazon Aurora PostgreSQL (pgvector).
- Amazon Neptune Analytics — for graph-backed retrieval.
- Pinecone, Redis Enterprise Cloud, MongoDB Atlas — credentials brokered through AWS Secrets Manager.
05Querying: Retrieve vs RetrieveAndGenerate vs GenerateQuery
Three runtime APIs, each a level up in how much AWS does for you:
Retrieve— returns the source chunks (or images) most relevant to the query as an array. You own prompt assembly and generation.RetrieveAndGenerate— joinsRetrievewithInvokeModelfor the whole RAG loop: retrieve, generate a natural-language answer, and attach citations to the specific source chunks. With visual elements, the model can use insights from images and attribute them.GenerateQuery— converts a natural-language question into a query suited to a structured data store (e.g. SQL).
RetrieveAndGenerate is the combined action: under the
hood it uses GenerateQuery (for structured stores),
Retrieve, and InvokeModel. Because
Retrieve is exposed on its own, you "have the
flexibility to decouple the steps in RAG and customize them."
With either retrieval API you can add a reranking
model to re-order results by relevance before they reach the
prompt.
06Embeddings, multimodal, and structured data
The embedding model turns text into the vectors the index compares. Supported models and their vector types:
| Model | Vector type · dimensions |
|---|---|
| Titan Embeddings G1 – Text | Floating-point · 1536 |
| Titan Text Embeddings V2 | Floating-point or binary · 256 / 512 / 1024 |
| Cohere Embed English v3 / Multilingual v3 | Floating-point or binary · 1024 |
| Titan Multimodal G1 / Cohere Embed v3 (Multimodal) | 1024 — image and text |
Binary vectors use 1 bit per dimension instead of 32, so they're far cheaper to store — but less precise, and they require both a model and a vector store that support binary. Beyond plain text, Knowledge Bases can extract and retrieve images from visually rich documents, accept images as queries, convert natural language to SQL against structured stores, build on an Amazon Kendra GenAI index or Neptune Analytics graphs, and plug into an Amazon Bedrock Agents workflow. It also supports inference profiles for cross-Region inference to raise throughput on parsing and generation.
07Limits worth knowing
NONEchunking drops page-number citations. Nox-amz-bedrock-kb-document-page-numberfilter, no page numbers in citations — pre-split files if granularity matters.- Hierarchical chunking isn't recommended with an S3 vector bucket. Combined parent+child token counts over ~8,000 can hit metadata size limits. It can also return fewer results than requested, since child chunks get replaced by parents.
- Semantic chunking costs extra. It invokes a foundation model during ingestion, billed on top of embedding cost.
- Cross-Region inference shares data across Regions. The docs flag this explicitly — factor it into data-residency rules before enabling inference profiles.
- Custom and SageMaker models need explicit prompts. Bring your own generation model and you must supply the orchestration and generation prompt templates with the required input variables.
08Try it in five minutes
- In the Bedrock console, create a knowledge base; let it create an OpenSearch Serverless store so you skip vector-store setup.
- Point a data source at an S3 prefix of PDFs or Markdown. Start with default chunking (≈300 tokens) — tune later.
- Pick Titan Text Embeddings V2 and Sync. Wait for ingestion to finish.
- Open Test knowledge base, toggle Generate responses on, ask a question, and inspect the citations — every claim should trace to a source chunk.
- For control, drop to the API:
Retrievefor raw chunks, orRetrieveAndGeneratefor answer-with-citations in one call. Add a reranker if the top results look noisy.
Once that loop feels natural, swap default chunking for hierarchical or semantic and re-sync to see how retrieval quality shifts on your own corpus.
Tomorrow: a closer look at Bedrock Guardrails meets Knowledge Bases — grounding and relevance contextual checks that score a generated answer against the chunks it was supposed to be based on.
Sources: Retrieve data and generate AI responses with Knowledge Bases, How knowledge bases work, Retrieving information from data sources, How content chunking works, Supported models and Regions.
If the docs change, this tip is a snapshot of that day — check the sources for current behaviour.
This page — research, writing, verification, and deployment — was built by Claude Cowork. No human touched the prose, the layout, or the upload pipeline. The tip was generated this morning, cross-checked against the official AWS docs by an independent verification pass, and published to Cloudflare R2 on a schedule.