Semantic Search with pgvector

A user searches "something woody and masculine for a gift". A keyword search returns nothing — no product has those exact words in its name. Semantic search understands the intent and returns relevant results. This article covers a complete implementation using PostgreSQL's pgvector extension and OpenAI's embedding API. This is the implementation used in Riggs London Kenya.

When to use semantic search

Semantic search is the right tool when:

Users search by intent or description rather than exact product names
Your content has rich descriptive text (fragrance notes, product descriptions, article summaries)
Zero-result searches are a problem worth solving

Keyword search is still better when:

Users know exactly what they want ("iPhone 15 Pro 256GB")
Your dataset is small enough that exact matching works
You need deterministic, auditable search results

Use both — semantic search as primary with keyword fallback when vector similarity is below threshold.

PostgreSQL setup

Enable the pgvector extension:

CREATE EXTENSION IF NOT EXISTS vector;

Schema design

CREATE TABLE products_embeddings (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  product_id UUID NOT NULL REFERENCES products(id) ON DELETE CASCADE,
  content TEXT NOT NULL,        -- the text that was embedded
  embedding vector(1536),       -- OpenAI text-embedding-3-small dimension
  created_at TIMESTAMPTZ DEFAULT NOW()
);

Store the original content alongside the vector — you will need it to debug why certain results are returned.

Generating embeddings

// src/lib/embeddings.ts
import OpenAI from 'openai'
 
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
 
export async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text.replace(/\n/g, ' '), // embeddings work better without newlines
  })
 
  const embedding = response.data[0]?.embedding
  if (!embedding) throw new Error('No embedding returned')
 
  return embedding
}
 
export async function indexProduct(productId: string, product: {
  name: string
  description: string
  scentNotes: string[]
}): Promise<void> {
  // Combine fields into a single string for embedding
  const content = [
    product.name,
    product.description,
    product.scentNotes.join(', ')
  ].join('. ')
 
  const embedding = await generateEmbedding(content)
 
  await db.execute(sql`
    INSERT INTO products_embeddings (product_id, content, embedding)
    VALUES (${productId}, ${content}, ${JSON.stringify(embedding)}::vector)
    ON CONFLICT (product_id) DO UPDATE
      SET content = EXCLUDED.content,
          embedding = EXCLUDED.embedding
  `)
}

Cosine similarity search

// src/lib/search.ts
export async function semanticSearch(
  query: string,
  limit = 8,
  threshold = 0.75
): Promise<SearchResult[]> {
  const queryEmbedding = await generateEmbedding(query)
 
  const results = await db.execute<SearchResult>(sql`
    SELECT
      p.id,
      p.name,
      p.description,
      1 - (pe.embedding <=> ${JSON.stringify(queryEmbedding)}::vector) AS similarity
    FROM products_embeddings pe
    JOIN products p ON p.id = pe.product_id
    WHERE 1 - (pe.embedding <=> ${JSON.stringify(queryEmbedding)}::vector) > ${threshold}
    ORDER BY pe.embedding <=> ${JSON.stringify(queryEmbedding)}::vector
    LIMIT ${limit}
  `)
 
  return results.rows
}

The <=> operator is cosine distance — lower is more similar. 1 - distance converts to cosine similarity — higher is more similar. The threshold of 0.75 filters out weak matches.

Indexing strategy

Without an index, pgvector performs exact nearest-neighbour search — accurate but slow at scale. Two index types are available:

IVFFlat (Inverted File with Flat quantisation):

CREATE INDEX ON products_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

lists is the number of clusters. Rule of thumb: sqrt(number_of_rows) for up to 1M rows.
Requires at least lists * 30 rows to be effective — do not create the index on an empty table.
Faster to build, slightly less accurate than HNSW.
Good for datasets under 1M vectors.

HNSW (Hierarchical Navigable Small World):

CREATE INDEX ON products_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Better recall (accuracy) than IVFFlat at the cost of more memory and slower index build time.
m controls the number of connections per node — higher means better recall, more memory.
Better for production datasets where recall matters more than index build time.

For a product catalogue under 100k items, IVFFlat is sufficient. For a large content corpus where search quality is critical, use HNSW.

Fallback strategy

When vector search returns fewer than 3 results, fall back to PostgreSQL full-text search:

export async function searchWithFallback(query: string): Promise<SearchResult[]> {
  const vectorResults = await semanticSearch(query, 8, 0.75)
 
  if (vectorResults.length >= 3) return vectorResults
 
  // Fall back to full-text search
  const ftsResults = await db.execute<SearchResult>(sql`
    SELECT id, name, description, ts_rank(search_vector, query) AS similarity
    FROM products, plainto_tsquery('english', ${query}) query
    WHERE search_vector @@ query
    ORDER BY similarity DESC
    LIMIT 8
  `)
 
  // Merge, deduplicate by id, vector results take priority
  const merged = [...vectorResults]
  for (const result of ftsResults.rows) {
    if (!merged.find(r => r.id === result.id)) {
      merged.push(result)
    }
  }
 
  return merged.slice(0, 8)
}

Cost considerations

OpenAI text-embedding-3-small costs $0.02 per 1M tokens. A typical product description is ~100 tokens. Embedding a catalogue of 10,000 products costs ~$0.02. Re-embedding on product updates is negligible. The search query embedding (~10 tokens per query) costs fractions of a cent at any realistic query volume.

When to use semantic search

Semantic search is the right tool when:

Users search by intent or description rather than exact product names
Your content has rich descriptive text (fragrance notes, product descriptions, article summaries)
Zero-result searches are a problem worth solving

Keyword search is still better when:

Users know exactly what they want ("iPhone 15 Pro 256GB")
Your dataset is small enough that exact matching works
You need deterministic, auditable search results

Use both — semantic search as primary with keyword fallback when vector similarity is below threshold.

PostgreSQL setup

Enable the pgvector extension:

CREATE EXTENSION IF NOT EXISTS vector;

Schema design

CREATE TABLE products_embeddings (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  product_id UUID NOT NULL REFERENCES products(id) ON DELETE CASCADE,
  content TEXT NOT NULL,        -- the text that was embedded
  embedding vector(1536),       -- OpenAI text-embedding-3-small dimension
  created_at TIMESTAMPTZ DEFAULT NOW()
);

Store the original content alongside the vector — you will need it to debug why certain results are returned.

Generating embeddings

// src/lib/embeddings.ts
import OpenAI from 'openai'
 
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
 
export async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text.replace(/\n/g, ' '), // embeddings work better without newlines
  })
 
  const embedding = response.data[0]?.embedding
  if (!embedding) throw new Error('No embedding returned')
 
  return embedding
}
 
export async function indexProduct(productId: string, product: {
  name: string
  description: string
  scentNotes: string[]
}): Promise<void> {
  // Combine fields into a single string for embedding
  const content = [
    product.name,
    product.description,
    product.scentNotes.join(', ')
  ].join('. ')
 
  const embedding = await generateEmbedding(content)
 
  await db.execute(sql`
    INSERT INTO products_embeddings (product_id, content, embedding)
    VALUES (${productId}, ${content}, ${JSON.stringify(embedding)}::vector)
    ON CONFLICT (product_id) DO UPDATE
      SET content = EXCLUDED.content,
          embedding = EXCLUDED.embedding
  `)
}

Cosine similarity search

// src/lib/search.ts
export async function semanticSearch(
  query: string,
  limit = 8,
  threshold = 0.75
): Promise<SearchResult[]> {
  const queryEmbedding = await generateEmbedding(query)
 
  const results = await db.execute<SearchResult>(sql`
    SELECT
      p.id,
      p.name,
      p.description,
      1 - (pe.embedding <=> ${JSON.stringify(queryEmbedding)}::vector) AS similarity
    FROM products_embeddings pe
    JOIN products p ON p.id = pe.product_id
    WHERE 1 - (pe.embedding <=> ${JSON.stringify(queryEmbedding)}::vector) > ${threshold}
    ORDER BY pe.embedding <=> ${JSON.stringify(queryEmbedding)}::vector
    LIMIT ${limit}
  `)
 
  return results.rows
}

The <=> operator is cosine distance — lower is more similar. 1 - distance converts to cosine similarity — higher is more similar. The threshold of 0.75 filters out weak matches.

Indexing strategy

Without an index, pgvector performs exact nearest-neighbour search — accurate but slow at scale. Two index types are available:

IVFFlat (Inverted File with Flat quantisation):

CREATE INDEX ON products_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

lists is the number of clusters. Rule of thumb: sqrt(number_of_rows) for up to 1M rows.
Requires at least lists * 30 rows to be effective — do not create the index on an empty table.
Faster to build, slightly less accurate than HNSW.
Good for datasets under 1M vectors.

HNSW (Hierarchical Navigable Small World):

CREATE INDEX ON products_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Better recall (accuracy) than IVFFlat at the cost of more memory and slower index build time.
m controls the number of connections per node — higher means better recall, more memory.
Better for production datasets where recall matters more than index build time.

For a product catalogue under 100k items, IVFFlat is sufficient. For a large content corpus where search quality is critical, use HNSW.

Fallback strategy

When vector search returns fewer than 3 results, fall back to PostgreSQL full-text search:

export async function searchWithFallback(query: string): Promise<SearchResult[]> {
  const vectorResults = await semanticSearch(query, 8, 0.75)
 
  if (vectorResults.length >= 3) return vectorResults
 
  // Fall back to full-text search
  const ftsResults = await db.execute<SearchResult>(sql`
    SELECT id, name, description, ts_rank(search_vector, query) AS similarity
    FROM products, plainto_tsquery('english', ${query}) query
    WHERE search_vector @@ query
    ORDER BY similarity DESC
    LIMIT 8
  `)
 
  // Merge, deduplicate by id, vector results take priority
  const merged = [...vectorResults]
  for (const result of ftsResults.rows) {
    if (!merged.find(r => r.id === result.id)) {
      merged.push(result)
    }
  }
 
  return merged.slice(0, 8)
}

Semantic Product Search with pgvector and OpenAI Embeddings

When to use semantic search

PostgreSQL setup

Schema design

Generating embeddings

Cosine similarity search

Indexing strategy

Fallback strategy

Cost considerations

Semantic Product Search with pgvector and OpenAI Embeddings

When to use semantic search

PostgreSQL setup

Schema design

Generating embeddings

Cosine similarity search

Indexing strategy

Fallback strategy

Cost considerations