Embeddings & Vector Search

Intermediate

An embedding turns a piece of text into a list of numbers (a vector) that captures its meaning. Texts with similar meaning get vectors that are close together — even if they share no words. That's the trick behind semantic search and RAG.

The intuition

Imagine every sentence placed as a point in a huge multi-dimensional space, arranged so that similar meanings sit near each other. "How do I reset my password?" lands near "I forgot my login," far from "best pizza in Rome."

Semantic vs keyword search

Keyword search matches literal words ("password" finds "password").
Semantic search matches meaning — "I can't sign in" finds the password-reset doc even without the word "password."

Best results often combine both (hybrid search).

How a vector search works

Embed your documents (usually split into chunks) and store the vectors in a vector database.
At query time, embed the query.
Find the nearest vectors (by cosine similarity / distance).
Return those chunks — typically to feed into RAG.

Practical notes

Chunking matters. Too big = noisy matches; too small = lost context. Tune it.
Use one embedding model consistently — vectors from different models aren't comparable.
Metadata + filters (date, source, type) make retrieval far more precise.
A vector DB isn't always needed — for small corpora, a simple in-memory search is fine.

The intuition​

Semantic vs keyword search​

How a vector search works​

Practical notes​

Next​

The intuition

Semantic vs keyword search

How a vector search works

Practical notes

Next