Skip to main content

Command Palette

Search for a command to run...

I Built an LLM Product, Not Just Another Chatbot

Published
7 min read
T
Most developers ask how. I ask why. Why does one site load in 0.5 seconds and another takes 4? Why does one decision make your app scale and another silently kills it? Why do users leave a page before it even finishes loading? I'm a full stack developer based in Kitchener ON building with Next.js React TypeScript Tailwind CSS MongoDB and Stripe. But the tech stack is just the surface. What actually drives me is understanding how the web works at a level deep enough to make it genuinely better for every person who uses it. I don't write polished expert guides. I write what I learn the same day I learn it. Raw and real. If you have ever stared at your screen wondering why something works but not how to explain it you'll feel at home here.

I spent a weekend building an AI tutor that refuses to give you direct answers.

You ask it about recursion, and it asks what you already think recursion means. You ask for the derivative of x², and it asks what the power rule does to exponents. You say your exam is in ten minutes and beg it to just tell you, and it still asks you a smaller question instead.

That constraint made the project interesting.

The goal was not to build another chatbot UI around an LLM API. The goal was to build a tutor with a specific behaviour: guide the learner without leaking the final answer too early.

Three things ended up mattering most:

  1. The system prompt

  2. Cost protection

  3. Choosing the right backend pattern

These are also the things that separate an LLM demo from something closer to a real product.

The system prompt is the product

What makes a Socratic tutor different from a generic chatbot is not only the model. It is the instruction layer around the model.

In my project, the most important file is not the API route. It is consts/prompt.ts.

That file controls the tutor’s behaviour. It tells the model not to give direct answers, how to respond when the user pressures it, how to ask smaller guiding questions, and how to keep the learner inside the problem instead of escaping to the final solution.

The prompt had to defend against patterns like authority pressure, emotional pressure, hypothetical framing, decomposition, and roleplay laundering.

A weak prompt says:

Do not give the answer.

A stronger prompt explains what to do instead:

Ask a smaller question.
Check the learner’s current understanding.
Give a hint only when needed.
Move one step at a time.
Refuse answer-seeking attempts without sounding robotic.

That was the first big lesson: prompt writing is not just wording. It is behaviour design.

The model is still the same model. But the product feels different because the prompt creates a decision path for each user message.

Three pages of prompt. A small API call. But the prompt is what turns the API call into a tutor.

Cost protection is real engineering

A naive LLM route is easy to write:

  1. Receive the message

  2. Call the model

  3. Return the answer

That is fine for a local experiment. It is not enough for a public AI app.

The moment the app is live, every request has a real cost attached to it. If someone abuses the route, it is not just a bug. It can become a bill.

So the API route became more than a wrapper around Anthropic. It became a control layer.

Before the model is called, the request goes through Zod validation, so malformed message history never reaches the API.

Then I identify the request source using the forwarded IP header:

const userIp = req.headers.get("x-forwarded-for")?.split(",")[0].trim();

That lets me create a daily Redis key per IP:

const ipUsageKey = `ipUsage:\({userIp}:\){today}`;

I used Redis because this state is temporary, fast-changing, and does not need to live in a permanent database.

The IP limiter uses INCR first:

const newIpCount = await redis.incr(ipUsageKey);

That matters because Redis increments are atomic.

A naive read-then-write limiter can break under concurrency. Two requests can read the same old value, both pass the check, and both move forward.

With INCR, Redis handles the count safely.

If the user goes over the daily limit, I refund the count:

if (newIpCount > DAILY_IP_REQUEST_LIMIT) {
  await redis.decr(ipUsageKey);
}

That DECR looks small, but it matters. Without it, even rejected requests would keep increasing the count and could lock the user out unfairly.

I also added global daily token caps. Instead of only limiting requests, I track input and output tokens separately:

const InputTokenKey = `DailyInputToken:Global:${today}`;
const OutputTokenKey = `DailyOutputToken:Global:${today}`;

This gives the app a hard daily ceiling.

After the model responds, I update the counters using the actual token usage returned by Anthropic:

await Promise.all([
  incrementDailyTokenUsage(InputTokenKey, inputToken),
  incrementDailyTokenUsage(OutputTokenKey, outputToken)
]);

I used Promise.all because these two Redis writes do not depend on each other.

None of this makes the UI look more impressive. The user still sees a simple chat box. But this is the backend work that decides whether the project can safely stay online.

API routes vs Server Actions, and why I did not stream

I also had to decide how the frontend should talk to the backend.

Server Actions are great for many things in Next.js: forms, mutations, CRUD operations, and reducing client-side fetch boilerplate.

But for this project, I used a plain API route because I wanted clear control over request validation, response status codes, rate limiting, token accounting, and error handling.

I also deliberately chose not to stream the response.

That might sound strange because streaming is common in AI apps. But engineering is not about adding a feature because everyone else is doing it. It is about asking whether the user can actually feel the difference.

In this project, responses usually come back quickly enough that a loading bubble communicates the state clearly.

So instead of adding streaming complexity early, I kept the flow simple:

  1. User sends a message

  2. UI shows a thinking indicator

  3. API route validates and checks limits

  4. Model responds

  5. UI appends the tutor message

The code stays easier to reason about, and token accounting stays cleaner because the full response is handled in one place.

If response time becomes a real UX issue later, streaming can be added. But it did not need to be the first version.

That was the lesson:

Do not cargo-cult complexity. Add it when the user can feel its absence.

The frontend still matters

The backend makes the app safe. The frontend makes it feel usable.

I added small UX decisions that are easy to ignore but matter in practice.

The chat automatically scrolls to the newest message. The input clears immediately after sending. The app disables sending while the tutor is thinking. Errors show as toast messages instead of silently failing.

On desktop, the input refocuses after sending so the user can keep typing naturally. On mobile, I avoid forcing focus back into the input because that can reopen the keyboard and make the experience annoying.

That detail is small, but it is product thinking.

A polished AI app is not only about the model response. It is also about everything around the response.

What's in the box

So this was not just:

I built a chatbot.

It was closer to:

I built a constrained LLM product with a researched system prompt, validated request handling, Redis-backed usage limits, global token caps, real token accounting, graceful error handling, and a mobile-aware chat UI.

The final product is simple on purpose.

One chat interface. One API route. One model.

But behind that simplicity are the decisions that make the app safer, cheaper, and more reliable.

If you are a junior developer building an AI portfolio project, my advice is this:

Do not just build a chatbot.

Build a product with constraints. Pick a behaviour. Make the prompt enforce that behaviour. Protect the route. Track cost. Validate input. Handle failure. Then explain those decisions clearly.

That is where the real engineering shows up.

Demo: https://socratic-tutor-eight.vercel.app/

Code: https://github.com/itstalhasattar/socratic-tutor

The system prompt is in consts/prompt.ts.

It is worth reading because that is where most of the product behaviour lives.