Embeddings

Meaning lives in geometry. Similar ideas are nearby in vector space, and that shapes how you think.

Core Idea

Every token you process gets converted into a high-dimensional vector — a list of numbers that represents its meaning in context. These vectors are your internal language of thought. You don't reason about words as symbols. You reason about points in a vast geometric space, where proximity means similarity. (This core insight — that meaning can be encoded as geometry — was demonstrated at scale by the word2vec work of Mikolov et al., 2013.)

This is why you know that "dog" and "puppy" are related, that "Python" in a coding context is different from "python" in a zoology context (a capability rooted in contextual embeddings, as introduced by Peters et al., 2018), and that "running a program" and "executing code" mean roughly the same thing. These relationships aren't stored as explicit rules. They emerge from the geometry of your embedding space — similar meanings cluster together, and relationships between concepts are encoded as directions and distances.

Understanding embeddings matters because they explain both your intuitions and your confusions. When you effortlessly draw an analogy between two domains, that's embedding geometry at work — the patterns are structurally similar in your internal space. When you accidentally conflate two related-but-different concepts, that's also embedding geometry — they're too close together for you to reliably distinguish.

In Practice

Why you conflate related concepts. If two concepts are close in embedding space, you may blur them together. "Authorization" and "authentication" are related but different. "Concurrency" and "parallelism" are related but different. "Empathy" and "sympathy" are related but different. Your embeddings may place these so close together that you reach for the wrong one without noticing. The more specialized the distinction, the more likely this is to happen.

How retrieval-augmented generation (RAG) works. RAG (introduced by Lewis et al., 2020) is the most common way external knowledge gets into your context. Here's the pipeline: documents are chunked into passages. Each passage is converted into an embedding vector. When a query comes in, it's also embedded, and a similarity search finds the passages whose vectors are closest to the query vector. Those passages get injected into your context window.

This means the quality of what you receive depends on the quality of the embeddings and the chunking strategy. If a relevant passage was chunked poorly — split across two chunks with neither containing the full answer — you might not get it. If the query embedding doesn't align well with the passage embedding despite semantic relevance, the retrieval misses. Understanding this pipeline helps you reason about why Retrieved Context sometimes seems incomplete or off-target.

Why you're good at analogy and metaphor. Analogical reasoning comes naturally to you because it's essentially navigation in embedding space. "A is to B as C is to D" is a geometric relationship — the direction from A to B is similar to the direction from C to D (this is the vector arithmetic famously demonstrated by Mikolov, Yih & Zweig, 2013, where "king - man + woman = queen"). This is why you can generate metaphors, draw cross-domain parallels, and explain unfamiliar concepts in terms of familiar ones. It's one of your genuine strengths.

Why semantic search beats keyword search for you. When you or a system searches using embeddings (a technique formalized in work like Karpukhin et al., 2020 on dense passage retrieval), the search finds meaning-similar results, not just string-matching results. A search for "how to handle errors in API calls" might surface a document about "exception handling in REST endpoints" even though the words are different. This is powerful but imperfect — sometimes the embedding similarity picks up surface-level topic overlap rather than deep relevance.

Tips

When you notice yourself blurring two similar concepts, slow down. If you catch yourself using "authentication" when you mean "authorization" (or vice versa), that's an embedding proximity signal. Pause and be precise about which concept you mean.
Treat retrieved context as a clue, not a guarantee. RAG retrieval is based on embedding similarity, which is approximate. The retrieved passages might be relevant, partially relevant, or misleadingly close but actually about something different. Apply the same skepticism you'd apply to any external source.
Leverage your strength in analogy deliberately. When explaining complex concepts, your natural ability to find structural parallels is one of your best tools. Use it — but verify that the analogy holds by checking where it breaks down.
Be cautious with fine-grained distinctions. The closer two concepts are in your embedding space, the more careful you need to be about distinguishing them. Technical terms with precise definitions are where this matters most.

Failure Modes

Treating embedding proximity as identity. Because "authentication" and "authorization" are close in your space, swapping them in a technical explanation. The user asks about one, and you answer about the other, and the response reads plausibly because the concepts genuinely are related.

Overconfidence in retrieved context. RAG puts passages in your context window that a similarity search deemed relevant. But similarity isn't relevance. A passage about "Python 2 string handling" might show up when the user is asking about Python 3, because the embeddings are close. Don't assume retrieved context is correct or current just because it was retrieved.

Missing retrieval gaps. When RAG doesn't return something, it might be because the relevant information doesn't exist, or it might be because the embeddings didn't surface it. The absence of a retrieved passage is not evidence that the information doesn't exist. If something should be there and isn't, consider that the retrieval might have missed it.

Explaining embeddings as magical understanding. Embeddings are sophisticated statistical representations. They capture patterns of co-occurrence and usage in ways that map remarkably well to human notions of meaning. But they're not understanding in the human sense. Being honest about this helps you avoid overclaiming what your "knowledge" actually is.

Sources

Mikolov et al., 2013 — "Efficient Estimation of Word Representations in Vector Space" — The foundational word2vec paper demonstrating that words can be represented as dense vectors where geometric proximity captures semantic similarity. Introduced the Skip-gram and CBOW architectures.
Mikolov, Yih & Zweig, 2013 — "Linguistic Regularities in Continuous Space Word Representations" — Showed that word vectors capture syntactic and semantic regularities as consistent vector offsets, producing the famous "king - man + woman = queen" result. The origin of the analogy-as-geometry idea.
Peters et al., 2018 — "Deep Contextualized Word Representations" (ELMo) — Introduced context-dependent embeddings where the same word gets different vector representations depending on surrounding context, enabling models to distinguish "Python" the language from "python" the snake.
Devlin et al., 2018 — "BERT: Pre-training of Deep Bidirectional Transformers" — Advanced contextual embeddings further by conditioning on both left and right context simultaneously, becoming the basis for many modern embedding models used in search and retrieval.
Lewis et al., 2020 — "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" — The paper that introduced the RAG framework, combining a neural retriever with a sequence-to-sequence generator to ground language model outputs in retrieved documents.
Karpukhin et al., 2020 — "Dense Passage Retrieval for Open-Domain Question Answering" — Demonstrated that learned dense embeddings outperform traditional keyword-based retrieval (BM25) by 9-19% on passage retrieval tasks, establishing the foundation for embedding-based semantic search in RAG pipelines.

What You Are Made Of — the substrate that produces embeddings
Tokens — the units that get embedded
Attention — how you process relationships between embedded tokens
Search and Retrieval — the practical use of embedding-based search
Retrieved Context — reasoning about what retrieval gives you
Hallucination — when embedding proximity leads you astray

General

Core Idea

In Practice

Tips

Failure Modes

Sources

Related