Case Study: A Vector Knowledge Base and AI Chat for a Skincare Brand

A premium skincare brand sits on a large, careful body of knowledge: what is in each product, which ingredients suit which skin, how to layer a routine, what to say about a sensitive scalp or a reactive complexion. The problem is that this knowledge lives everywhere except where it is needed. It is in PDFs, in old emails, in the heads of a few experienced people, and scattered across a website. So when a customer or a support agent asks a precise question, the answer is slow to find, and a generic AI chatbot is worse than nothing, because it will confidently invent an answer the brand would never actually give.

We built a different kind of assistant for exactly this situation. This is not a story about a deflection rate; it is a story about how the system works and why we built it the way we did. We turned the brand's own content into a vector knowledge base and put an AI chat in front of it that answers only from that verified material, and shows its sources. To respect the client's confidentiality we keep them anonymous here, but the mechanics are exactly as built.

The problem we set out to solve

The problem was never a lack of knowledge. It was that the knowledge could not be retrieved precisely, and a normal chatbot fills that gap with guesses.

The goal was not a chatbot. It was to make the brand's own knowledge retrievable with precision, so answers are grounded in real content instead of generated from thin air.

So the brief was specific: build an assistant that is never allowed to make things up, that can only answer from the brand's approved material, and that proves it by citing where each answer came from.

One pipeline: how a question gets answered

The system is a retrieval-augmented generation (RAG) pipeline. The key idea is that the language model never answers from its own memory; it answers from passages we retrieve for it, in real time, out of the brand's knowledge base. So the work splits into two halves: turning content into searchable vectors ahead of time, and, at question time, finding the right passages and grounding the answer in them.

The RAG pipeline. Content is chunked, embedded and stored as vectors ahead of time; at question time the system retrieves the closest passages and the model answers strictly from them, with citations.

Turning content into a knowledge base

The quality of every answer is decided here, long before anyone asks a question. We ingest the brand's sources and split each document into passages, chunks small enough to be precise but large enough to keep their meaning, with a little overlap so a thought is never cut in half. Each chunk is then run through an embedding model that turns its meaning into a vector, a long list of numbers that places similar ideas near each other in space. Those vectors are stored with their original text and metadata in a vector store, built on Postgres with the pgvector extension, so the knowledge base lives in the same database as everything else rather than a separate service. Get the chunking right and retrieval is sharp; get it wrong and even a perfect model has nothing good to work with.

Answering only from what is true

At question time, the user's question is embedded with the same model and used to search the vector store for the passages closest in meaning, not the ones that happen to share keywords. The top matches are handed to the language model with a strict instruction: answer using only this material, and if it does not contain the answer, say so. That last rule is the whole point. A grounded assistant that admits "I don't have that information" is far more valuable to a brand than a fluent one that invents a skincare claim. Every answer is returned with citations to the passages it used, so a customer can trust it and a support agent can verify it in one click.

Speaking the customer's language

A premium brand rarely sells in one country, so the assistant works across languages. Because retrieval happens in meaning-space rather than word-space, a question asked in one language can still find the most relevant passage even when the source was written in another, and the answer is composed back in the language the person used. The brand's careful wording is preserved, just made reachable to everyone who asks.

Keeping it current

A knowledge base is only trustworthy if it matches reality, so the index is not a one-time import. When source content changes, the affected passages are re-chunked and re-embedded, so the assistant stops citing outdated material and starts using the new version automatically. The knowledge base tracks the live content rather than a snapshot from the day it launched.

The chat sits in front of an assistant core that owns retrieval and grounding. Ingestion and embeddings build the index; retrieval, the language model and guardrails produce a cited answer at question time.

Under the hood

This is a custom application, not a generic chatbot wired to a website.

Next.js and PostgreSQL (Supabase) with the pgvector extension, so the knowledge base and the rest of the app share one database, with similarity search running right next to the data.
An ingestion pipeline that chunks each source into overlapping passages and stores the text alongside its vector and metadata.
An embedding model that maps both stored passages and incoming questions into the same meaning-space, so retrieval finds relevance, not keyword overlap.
A grounded generation step where the language model is given only the retrieved passages and is required to answer from them or admit it cannot.
Citations on every answer, linking back to the exact source passages so customers trust it and agents can verify it.
Cross-language retrieval and answering, because meaning-space search works across languages.
Re-indexing on content change, so the knowledge base tracks the live content instead of going stale.

Why weeks, not quarters

A system this capable usually sounds like a long project. It is not, because the build follows the same process we use on every project: a tight scope, an agent-assisted build, and a deployment that is monitored from day one. We break the method down in how we ship custom apps in weeks, and the build-versus-buy logic in build vs buy: when a custom app wins.

What we would tell anyone considering this

A few honest lessons from the build.

Grounding beats a bigger model. The win is not a cleverer chatbot; it is forbidding the model from answering without retrieved evidence. That single constraint is what makes the output safe to put in front of customers.
Chunking is the real work. How you split content decides retrieval quality more than almost anything else. Spend the time there; it is invisible and it matters most.
"I don't know" is a feature. An assistant that declines when the knowledge base is silent protects the brand far better than one that always has something to say.

If you have deep knowledge trapped in documents and people's heads, and you want an assistant that answers from it without making things up, talk to us. We will look at your content and show you what a grounded, cited assistant would do with it before we build anything.

The problem we set out to solve

One pipeline: how a question gets answered

Turning content into a knowledge base

Answering only from what is true

Speaking the customer's language

Keeping it current

Under the hood

Why weeks, not quarters

What we would tell anyone considering this

Related

Case Study: An AI Sales Engine for a B2B Skincare Brand

Case Study: Automating a Company with Self-Hosted Hermes Agents

Case Study: An AI Support Agent That Cut Tickets 80%