Vector Embeddings Explained: The Foundation of Modern AI

If you've ever wondered how machines understand that "king" minus "man" plus "woman" equals "queen," you're about to have an aha moment. Vector embeddings are the secret sauce behind modern AI — from semantic search to recommendation systems to RAG pipelines.

Let's demystify them with code you can actually run.

What Are Embeddings, Really?

Think of embeddings as coordinates on a map of meaning. Every word, sentence, or document gets a unique location (a vector of numbers) where similar things are placed close together.

Python

Copy

from openai import OpenAI
import numpy as np

client = OpenAI()

def get_embedding(text):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return np.array(response.data[0].embedding)

# Get embeddings for similar concepts
cat_embedding = get_embedding("cat")
kitten_embedding = get_embedding("kitten")
car_embedding = get_embedding("car")

# Calculate cosine similarity
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(f"cat vs kitten: {cosine_similarity(cat_embedding, kitten_embedding):.3f}")  # ~0.85
print(f"cat vs car: {cosine_similarity(cat_embedding, car_embedding):.3f}")  # ~0.45

The Math Behind the Magic

An embedding is just a list of numbers — typically 384 to 3072 dimensions. Each dimension captures some aspect of meaning. The beauty is that we don't manually define what each dimension means; the model learns it from data.

Python

Copy

# A typical embedding looks like this:
embedding = [
    0.023, -0.156, 0.089, 0.234, -0.067, ...  # 1536 numbers for text-embedding-3-small
]

# The distance between embeddings tells us semantic similarity
# Close = similar meaning, Far = different meaning

Building a Semantic Search Engine

Let's build something practical — a search engine that understands meaning, not just keywords:

Python

Copy

from openai import OpenAI
import numpy as np
from typing import List, Tuple

client = OpenAI()

class SemanticSearch:
    def __init__(self):
        self.documents = []
        self.embeddings = []
    
    def add_documents(self, docs: List[str]):
        """Add documents to the search index."""
        for doc in docs:
            embedding = self._get_embedding(doc)
            self.documents.append(doc)
            self.embeddings.append(embedding)
    
    def _get_embedding(self, text: str) -> np.ndarray:
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return np.array(response.data[0].embedding)
    
    def search(self, query: str, top_k: int = 3) -> List[Tuple[str, float]]:
        """Find the most similar documents to the query."""
        query_embedding = self._get_embedding(query)
        
        # Calculate similarities
        similarities = []
        for i, doc_embedding in enumerate(self.embeddings):
            sim = np.dot(query_embedding, doc_embedding) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(doc_embedding)
            )
            similarities.append((self.documents[i], sim))
        
        # Sort by similarity and return top results
        similarities.sort(key=lambda x: x[1], reverse=True)
        return similarities[:top_k]

# Usage
search = SemanticSearch()
search.add_documents([
    "Python is a programming language known for its simplicity",
    "Machine learning models can predict outcomes from data",
    "Cats are popular pets that love to sleep",
    "Neural networks are inspired by the human brain",
    "JavaScript runs in web browsers"
])

results = search.search("How do AI systems learn?")
for doc, score in results:
    print(f"{score:.3f}: {doc}")

Choosing the Right Embedding Model

Not all embeddings are created equal. Here's a quick guide:

Python

Copy

# OpenAI Models (API-based)
# text-embedding-3-small: 1536 dims, fast, cheap, good for most use cases
# text-embedding-3-large: 3072 dims, better quality, higher cost

# Open Source Models (run locally)
from sentence_transformers import SentenceTransformer

# all-MiniLM-L6-v2: 384 dims, very fast, good quality
model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode("Your text here")

# all-mpnet-base-v2: 768 dims, slower, better quality
model = SentenceTransformer('all-mpnet-base-v2')

# For code: Use specialized models
from sentence_transformers import SentenceTransformer
code_model = SentenceTransformer('microsoft/codebert-base')

Storing Embeddings at Scale: Vector Databases

For production, you need a vector database. Here's how to use Chroma (local) and Pinecone (cloud):

Python

Copy

# Option 1: Chroma (local, great for development)
import chromadb
from chromadb.utils import embedding_functions

# Create client and collection
client = chromadb.Client()
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-api-key",
    model_name="text-embedding-3-small"
)

collection = client.create_collection(
    name="my_documents",
    embedding_function=openai_ef
)

# Add documents
collection.add(
    documents=["doc1 content", "doc2 content"],
    ids=["doc1", "doc2"],
    metadatas=[{"source": "web"}, {"source": "pdf"}]
)

# Query
results = collection.query(
    query_texts=["search query"],
    n_results=5
)

Python

Copy

# Option 2: Pinecone (cloud, production-ready)
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="my-index",
    dimension=1536,  # Match your embedding model
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("my-index")

# Upsert vectors
index.upsert(vectors=[
    {"id": "doc1", "values": embedding1, "metadata": {"text": "..."}},
    {"id": "doc2", "values": embedding2, "metadata": {"text": "..."}}
])

# Query
results = index.query(vector=query_embedding, top_k=5, include_metadata=True)

"Embeddings turn text into numeric vectors so machines can reason about meaning: similar phrases map to nearby points in high-dimensional space."

Pro Tips for Better Embeddings

1. Chunk wisely — Embeddings work best on coherent chunks of 100-500 tokens.

2. Add context — Prepend document titles or section headers to improve retrieval.

3. Normalize your vectors — Most similarity metrics assume normalized vectors.

4. Test with your data — Different models perform differently on different domains.

Embeddings are the foundation of modern AI applications. Once you understand them, you'll see opportunities everywhere — from search to recommendations to clustering. Start experimenting, and you'll be surprised how quickly you can build powerful semantic applications.

Vector Embeddings Explained: The Foundation of Modern AI

What Are Embeddings, Really?

The Math Behind the Magic

Building a Semantic Search Engine

Choosing the Right Embedding Model

Storing Embeddings at Scale: Vector Databases

Pro Tips for Better Embeddings

More Articles

Building Your First AI Agent: A Practical Guide to Agentic Development

RAG Done Right: Production Patterns That Actually Work

Prompt Engineering Patterns That Actually Work

Fine-Tuning LLMs on a Budget: A Practical Guide to LoRA and QLoRA