How to Build a RAG Chatbot with LangChain and Next.js

What You'll Need

Before diving in, make sure you have the following ready:

Node.js 20+ installed locally
A Next.js 15 project (App Router)
An OpenAI API key (GPT-4o recommended)
A Supabase account for vector storage (pgvector)
Basic familiarity with TypeScript and React Server Components

This tutorial builds a Retrieval-Augmented Generation (RAG) chatbot — a pattern that's become the standard approach for businesses that want an AI assistant grounded in their own data rather than generic model knowledge. Whether you're running a SaaS product in Canada, an e-commerce store in Australia, or a service business in Singapore, this pattern lets your chatbot answer questions accurately from your own documentation, FAQs, or product content.

Step 1: Set Up Your Next.js Project and Install Dependencies

Start with a fresh Next.js 15 project if you don't already have one:

npx create-next-app@latest rag-chatbot --typescript --app
cd rag-chatbot

Now install the core dependencies:

npm install langchain @langchain/openai @langchain/community
npm install @supabase/supabase-js ai
npm install @langchain/textsplitters

The ai package here is Vercel's AI SDK 4.x, which provides excellent streaming primitives that integrate cleanly with Next.js Server Actions and Route Handlers.

Pro tip: Pin your LangChain version to at least 0.3.x. The 0.3 release introduced the updated LCEL (LangChain Expression Language) syntax that makes chain composition significantly cleaner than earlier versions.

Step 2: Configure Supabase for Vector Storage

Supabase's pgvector extension is one of the most practical choices for RAG vector storage in 2026 — it removes the need for a separate vector database and works well at SMB scale.

2a. Enable pgvector in Supabase

In your Supabase SQL editor, run:

create extension if not exists vector;

create table documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(1536)
);

create index on documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

The 1536 dimension matches OpenAI's text-embedding-3-small model output. If you switch to a different embedding model, adjust this accordingly.

2b. Add Environment Variables

Create a .env.local file:

OPENAI_API_KEY=sk-your-key-here
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key

Common pitfall: Never expose your SUPABASE_SERVICE_ROLE_KEY to the client. It should only ever be used in server-side Route Handlers or Server Actions.

Step 3: Build the Document Ingestion Pipeline

RAG only works as well as the data you feed it. This step creates a script that chunks your documents, embeds them, and stores them in Supabase.

Create scripts/ingest.ts:

import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
import { OpenAIEmbeddings } from '@langchain/openai';
import { SupabaseVectorStore } from '@langchain/community/vectorstores/supabase';
import { createClient } from '@supabase/supabase-js';
import fs from 'fs';

const client = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

async function ingest() {
  const rawText = fs.readFileSync('./data/knowledge-base.txt', 'utf-8');

  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
    chunkOverlap: 200,
  });

  const docs = await splitter.createDocuments([rawText]);

  await SupabaseVectorStore.fromDocuments(
    docs,
    new OpenAIEmbeddings({ model: 'text-embedding-3-small' }),
    { client, tableName: 'documents', queryName: 'match_documents' }
  );

  console.log('Ingestion complete:', docs.length, 'chunks stored');
}

ingest();

You'll also need to create the match_documents function in Supabase. Run this in the SQL editor:

create or replace function match_documents(
  query_embedding vector(1536),
  match_count int default 5
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
) language sql stable as $$
  select id, content, metadata,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  order by embedding <=> query_embedding
  limit match_count;
$$;

Place your source documents in ./data/knowledge-base.txt and run the ingestion script with npx tsx scripts/ingest.ts.

Pro tip: For production, use a chunking strategy matched to your content type. FAQs work well with smaller chunks (500–800 tokens). Long-form documentation benefits from larger chunks with more overlap.

Step 4: Create the RAG API Route

Now build the Route Handler that powers the chatbot. Create app/api/chat/route.ts:

import { NextRequest } from 'next/server';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';
import { SupabaseVectorStore } from '@langchain/community/vectorstores/supabase';
import { createClient } from '@supabase/supabase-js';
import { createStuffDocumentsChain } from 'langchain/chains/combine_documents';
import { createRetrievalChain } from 'langchain/chains/retrieval';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StreamingTextResponse, LangChainStream } from 'ai';

export const runtime = 'nodejs';

const supabaseClient = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

export async function POST(req: NextRequest) {
  const { messages } = await req.json();
  const question = messages[messages.length - 1].content;

  const { stream, handlers } = LangChainStream();

  const vectorStore = new SupabaseVectorStore(
    new OpenAIEmbeddings({ model: 'text-embedding-3-small' }),
    { client: supabaseClient, tableName: 'documents', queryName: 'match_documents' }
  );

  const retriever = vectorStore.asRetriever({ k: 5 });

  const llm = new ChatOpenAI({
    model: 'gpt-4o',
    streaming: true,
    callbacks: [handlers],
  });

  const prompt = ChatPromptTemplate.fromTemplate(`
You are a helpful assistant. Answer the question using only the context provided.
If the answer is not in the context, say you don't have that information.

Context: {context}

Question: {input}`);

  const documentChain = await createStuffDocumentsChain({ llm, prompt });
  const retrievalChain = await createRetrievalChain({
    retriever,
    combineDocsChain: documentChain,
  });

  retrievalChain.invoke({ input: question }).catch(console.error);

  return new StreamingTextResponse(stream);
}

Common pitfall: Without export const runtime = 'nodejs', this route may be deployed to the Edge runtime, which doesn't support all LangChain dependencies. Keep it on the Node.js runtime.

Step 5: Build the Chat UI Component

Create a clean, streaming-capable chat interface using the Vercel AI SDK's useChat hook. Add components/ChatWidget.tsx:

'use client';
import { useChat } from 'ai/react';

export function ChatWidget() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({ api: '/api/chat' });

  return (
    <div className="flex flex-col h-[500px] border rounded-xl overflow-hidden">
      <div className="flex-1 overflow-y-auto p-4 space-y-3">
        {messages.map((m) => (
          <div
            key={m.id}
            className={`p-3 rounded-lg max-w-[80%] ${
              m.role === 'user' ? 'ml-auto bg-blue-600 text-white' : 'bg-gray-100'
            }`}
          >
            {m.content}
          </div>
        ))}
        {isLoading && <div className="text-gray-400 text-sm">Thinking...</div>}
      </div>
      <form onSubmit={handleSubmit} className="border-t p-3 flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask a question..."
          className="flex-1 border rounded-lg px-3 py-2 text-sm"
        />
        <button
          type="submit"
          disabled={isLoading}
          className="bg-blue-600 text-white px-4 py-2 rounded-lg text-sm"
        >
          Send
        </button>
      </form>
    </div>
  );
}

Drop <ChatWidget /> into any page in your app and you'll have a streaming, RAG-powered chat interface connected to your own data.

Step 6: Test and Validate Your RAG Pipeline

Before shipping, verify two things separately:

6a. Test Retrieval Quality

Write a quick test script that queries your vector store directly and inspects what documents are being returned for a sample question. Poor retrieval is the number one reason RAG chatbots give bad answers — it's almost never the LLM itself.

6b. Test for Hallucination

Ask your chatbot questions that are deliberately not in your knowledge base. Your system prompt should be forcing it to admit when it doesn't know. If it's still making things up, tighten the prompt and consider reducing the temperature to 0.2 for factual use cases.

Pro tip: Add a similarity threshold filter to your retriever. If the best match scores below 0.75 cosine similarity, return a fallback message rather than passing weak context to the LLM.

Step 7: Deploy to Vercel

RAG chatbots deployed on Vercel work well with the Node.js runtime. Push your project and set the environment variables in your Vercel project settings. Ensure your Supabase project is not on a free tier if you expect more than a handful of concurrent users — connection pooling via Supabase's PgBouncer integration becomes important at scale.

At Lenka Studio, we've shipped RAG implementations for clients in healthcare, legal, and retail — and one thing consistently holds true: the ingestion pipeline and retrieval quality matter far more than which LLM you choose. Get your chunking and similarity thresholds right before optimising anything else.

Common Pitfalls to Avoid

Over-chunking: Chunks that are too small lose context. Aim for 800–1200 tokens with overlap.
Stale embeddings: If your source data changes, re-run ingestion for affected documents. Consider a scheduled job or webhook-triggered ingestion for live content.
No source attribution: For business use cases, always return the source document name in the response metadata so users can verify answers.
Ignoring conversation history: For multi-turn conversations, you'll need to include prior messages in context. Look into createHistoryAwareRetriever from LangChain for this.

Next Steps

You now have a working RAG chatbot that retrieves answers from your own data and streams responses in real time. From here, you can extend it by adding conversation memory with BufferWindowMemory, building an admin UI to manage your knowledge base, or integrating with tools like Notion or Google Drive as live data sources.

If you're building this for a business application and want it production-ready — with proper auth, rate limiting, analytics, and a polished interface — the team at Lenka Studio can help you move from prototype to shipped product. Get in touch and let's talk through what you're building.