The problem
Riggs London had no Kenyan digital presence. The brief was straightforward: build a commerce experience that works for Kenyan buyers on mobile, on variable network conditions, paying with M-Pesa.
The constraint that shaped every architecture decision: international card processors fail 15–30% of the time for Kenyan cardholders. Cross-border fraud controls, bank declines, card-not-present friction — the failure modes are well-documented and largely outside the merchant's control. M-Pesa had to be the primary payment method, not an afterthought.
The second constraint was product discovery. Fragrance buyers don't search by product name. They search by intent — "something for my dad who likes the outdoors", "a gift that smells expensive but isn't". Keyword search returns nothing for those queries. The platform needed to understand what the customer meant, not just what they typed.
Architecture
The repository is a Turborepo monorepo with two independent deploy targets:
apps/web— Next.js 14 App Router storefront, deployed to Vercelapps/api— Fastify backend handling payments, webhooks, and embedding generation, deployed to Railwaypackages/db— shared Prisma schema and generated client, consumed by both appspackages/ui— shared React component library
Keeping the API separate from the storefront was a deliberate call. Payment webhook handlers need to be reliable and isolated — a storefront deploy should never affect payment processing. The monorepo gives shared TypeScript contracts across the boundary without coupling the deployment lifecycle.
Why Fastify over Express for the payments API
Fastify's schema-driven request validation means malformed webhook payloads are rejected at the framework level before they reach business logic. For a payments backend where a partial state update is worse than a rejected request, that matters. The TypeScript ergonomics are also significantly better than Express — no @types/express gymnastics.
PostgreSQL with pgvector over a dedicated vector database
The product catalogue is under 500 items. Adding Pinecone or Qdrant would mean another API key, another billing account, another service to monitor, and another failure point — for a problem that three SQL statements solve. pgvector's IVFFlat index gives acceptable approximate nearest-neighbour latency at this scale. The embeddings live in the same database as the product data, which simplifies backups, monitoring, and local development.
M-Pesa STK Push integration
The payment flow: customer initiates checkout → API calls Daraja to send STK Push to customer's handset → customer approves on their phone → Daraja calls the registered callback → API verifies and confirms the order.
The implementation decisions that made this reliable in production:
Idempotency keys stored in Redis. Every STK Push request generates a unique key. Duplicate requests within the 30-second STK expiry window return the existing payment record without triggering a second charge. This matters because mobile network conditions cause users to tap "Pay" more than once.
Callback verification before state mutation. The callback payload is validated against the expected Daraja format before any order state changes. Unverifiable callbacks are logged and discarded — they don't cause partial updates.
Timeout reconciliation. STK Push expires after 30 seconds if the user doesn't approve. A background job marks pending payments older than 35 seconds as expired and releases reserved inventory. Without this, inventory stays locked indefinitely on abandoned checkouts.
Semantic search pipeline
At product creation, the API generates a 1536-dimension embedding from the concatenation of product name, description, and scent notes using OpenAI text-embedding-3-small. The vector is stored in a products_embeddings table with an IVFFlat index for approximate nearest-neighbour queries.
For a user search, the query string is embedded with the same model and a cosine-similarity query runs against the index with a threshold of 0.75. If fewer than 3 results clear the threshold, a PostgreSQL full-text search runs as fallback and results are merged with deduplication. This hybrid approach keeps recall high for intent-based queries while still surfacing exact matches when the user knows what they want.
AI Scent Advisor
The assistant uses Claude 3.5 Haiku — chosen for latency, not capability. A chatbot that takes 4 seconds to respond in a shopping context loses the user. Haiku returns first tokens in under 300ms. The domain is narrow enough that a smaller, faster model constrained to a tight system prompt outperforms a larger model given a vague brief.
The system prompt includes a lightweight product index and explicit constraints: never fabricate product names, prices, or availability; if no product matches, say so and suggest the closest option. Rate limits are enforced at two levels — 10 messages per session and 50 per IP per day — using Redis. Without rate limiting, an unprotected chat endpoint is an open billing liability.
Notification pipeline
Order confirmations trigger multi-channel delivery with fallback semantics. Primary channel is WhatsApp Business API with a templated confirmation message. If WhatsApp delivery fails after a 30-second retry window, Africa's Talking SMS fires as fallback. A SendGrid transactional email with full line items goes out regardless of the other channels.
Shopify is the inventory source of truth. A webhook on inventory_levels/update batches updates into a reconciliation job that upserts stock levels into PostgreSQL. The storefront reads from PostgreSQL — not Shopify directly — to keep read latency stable during traffic spikes.
Infrastructure
| Service | Provider | Monthly cost |
|---|---|---|
| Next.js frontend | Vercel | ~$20 |
| Fastify API + PostgreSQL + Redis | Railway | ~$58 |
| OpenAI embeddings | OpenAI API | ~$15 |
| Claude AI chatbot | Anthropic API | ~$15 |
| Total | ~$108 |
Railway keeps operational overhead low for a pre-revenue deployment. The stack is container-first and can migrate to Azure or AWS with minimal changes to deployment configuration.
What I learned
M-Pesa is not just a payment method — it is the payment method. The engineering effort in robust STK Push handling (idempotency, callback verification, timeout reconciliation) paid off directly in conversion. An 85% payment success rate against a 15–30% baseline for international cards is the difference between a viable business and one that isn't.
pgvector eliminated a service boundary. Consolidating embeddings into PostgreSQL simplified backups, monitoring, and local development. At this catalogue scale the latency tradeoff was acceptable and the operational risk reduction was real.
Streaming matters more than accuracy for chatbot UX. Delivering first tokens quickly via streaming materially improved perceived responsiveness. Users who saw a blank screen for 2 seconds assumed the feature was broken. The same response delivered token-by-token felt instant.
Shopify sync complexity is underestimated. Handling unordered, idempotent webhook deliveries required careful design. A reconciliation job with idempotent upserts and event deduplication made the integration robust — but it took longer to get right than the payment integration.
For repository and implementation details: github.com/edogola4