Business outcome. Support teams cut response time on policy and product questions from minutes to seconds while staying within their own approved content. Every answer carries a citation, so reviewers can audit each reply in a click — and the assistant refuses to fabricate when retrieval returns nothing, which is the failure mode that gets generic chatbots taken off production.
Built for support, customer success, and internal-help-desk teams whose answers already live in their own documents but cost hours to dig out manually. The platform ships as the smallest RAG pipeline that survives production: ingest, retrieve, cite, refuse on empty.
Pipeline
What's built
- Drag-and-drop ingestion. Admin uploads a file; server parses, chunks, embeds, writes vectors to Pinecone with the source filename in metadata. Postgres tracks chunk counts; delete cascades both stores in one server action.
- Top-5 retrieval per message. Matching chunks land in the system prompt under
## Knowledge base, prefixed[1] (from: returns-policy.pdf)so the model cites naturally. - Streaming chat UI via the Vercel AI SDK.
- "Don't know" is the default. Empty retrieval strips the knowledge-base block; the persona's refusal rule takes over instead of letting the model confabulate.
- Admin console at
/adminwith chunk counts, delete, and upload-type validation.
Tradeoffs
- Top-5 retrieval, no rerank or query rewrite. Cheap to debug and predictable per question; gives up recall on multi-hop questions where the relevant chunks aren't surfaced by the user's exact phrasing.
- Filename citations, not page numbers. Fine for short policies, weaker on long PDFs where readers want a page jump. Page-level metadata is a one-line ingestion change when the docs justify it.
- Refusal is the default on empty retrieval, not a fallback. Hallucinated policy is worse than a polite "I don't know". This is chosen explicitly so the persona never quietly drops back to the model's general training.