How to Generate Text Embeddings With OpenAI (and Ollama)

Generating text embeddings means POSTing each document to an embeddings endpoint and getting a list of floats back - a dense vector that encodes the text's meaning. The OpenAI API and Ollama use the same wire shape, so code written for one transfers to the other with a URL and model swap.

AI Engineerembeddingsopenaivectors

What the embeddings API returns

You POST a string and the model returns a vector - a list of floats (768 dims for nomic-embed-text, 1536 for text-embedding-3-small). That vector is the numeric representation of the text's meaning, and it's the input every vector store (pgvector, Pinecone, Chroma, OpenSearch) expects at ingestion time.

Calling the embeddings endpoint

The Ollama /api/embeddings route uses the same request shape as OpenAI's /v1/embeddings, so the pattern is identical:

import requests

EMBED_URL = "http://localhost:11434/api/embeddings"  # or OpenAI's endpoint
MODEL = "nomic-embed-text"  # or "text-embedding-3-small" on OpenAI

def embed_docs(docs):
    pairs = []
    for doc in docs:
        r = requests.post(
            EMBED_URL,
            json={"model": MODEL, "prompt": doc},
            timeout=60,
        )
        r.raise_for_status()
        vector = r.json()["embedding"]   # list of 768 floats
        pairs.append((doc, vector))
    return pairs

The return value - a list of (doc, vector) tuples - is exactly what a vector store's bulk-insert method takes for ingestion.

Switching to OpenAI

Point at OpenAI's endpoint and use input instead of prompt (the only structural difference in the request body):

import openai

client = openai.OpenAI()  # reads OPENAI_API_KEY from env

def embed_docs_openai(docs):
    resp = client.embeddings.create(model="text-embedding-3-small", input=docs)
    return [(doc, item.embedding) for doc, item in zip(docs, resp.data)]

The OpenAI SDK batches the list in one request - much faster than one call per doc for large corpora.

What to watch out for in production

Batch, don't loop - sending a list of strings in one request is 10x faster than calling the endpoint once per document. The OpenAI SDK handles this for you; with Ollama you need to loop (it doesn't support batch input yet).
Pin the model version - switching from nomic-embed-text to text-embedding-3-small changes the vector dimensions and invalidates every stored vector. Re-embed the whole corpus when you change models.
Store the model name alongside the vector - in your DB row or metadata, record which model produced each vector. When you upgrade the model later you know which rows need re-embedding.
Verify the dimension - print len(vector) after the first call to confirm you're getting the expected size (768 for nomic-embed-text, 1536 for text-embedding-3-small) before inserting thousands of rows.

Want to try it hands-on? HeyDevJob gives you this exact setup in a live cloud workspace in your browser - edit it, run it, and see it work. Free, nothing to install.

Try it in a workspace →

What you'll practice

POSTing to the embeddings endpoint and extracting the vector from the response
Assembling (doc, vector) pairs ready for vector store ingestion
Switching between Ollama and OpenAI with a URL and model change

FAQ

How do I generate text embeddings with OpenAI?

Call client.embeddings.create(model='text-embedding-3-small', input=your_texts) - pass a list of strings to batch them in one request. Each item in resp.data has an .embedding attribute containing the vector as a list of floats.

What is the difference between OpenAI embeddings and Ollama embeddings?

The request shapes are nearly identical - both take a model name and text and return a vector. Ollama uses 'prompt' as the input key and returns 'embedding'; OpenAI uses 'input' and returns a 'data' list. Ollama doesn't support batching a list in one call yet, so you loop; the OpenAI SDK batches automatically.

What vector dimensions does nomic-embed-text produce?

nomic-embed-text outputs 768-dimensional vectors. OpenAI's text-embedding-3-small outputs 1536 dims by default (reducible via the 'dimensions' parameter). Dimension size must match whatever your vector store was created with - changing models requires re-embedding the entire corpus.

Keep learning

Restore a Broken LLM API IntegrationAI/ML project Harden an LLM Pipeline Against API FailuresAI/ML project Parse JSON From an LLM (Strip Markdown Fences)AI/ML project AI/ML roadmapStep by step to hired AI/ML interview questionsSTAR answers All AI/ML projectsProjects hub

Learn it by doing. Open this in a live cloud workspace, make the change yourself, and keep a record of the work you can share.

Open the workspace →