How I Built a Business-Ready GenAI Customer Support Agent
From message classification to real-time invoice checks, escalation handling, and MLOps — here's how I turned a prototype into a production-grade support system using Gemini.
From Prototype to Product-Ready Agent
I built a GenAI customer support agent that actually helps — not just by answering questions, but by classifying messages, retrieving relevant documents, calling live functions, and adapting its tone.
In the first version, it could classify support emails and generate polite replies.
Not bad — but let’s be honest: it was basically a smart auto-responder.
Then I started thinking:
What if it could understand the request?
What if it could check a live invoice database before answering?
What if it could evaluate its own replies and tell me how helpful they were?
So I rebuilt it. Same core, smarter brain.
This time, it can:
✅ Retrieve knowledge from real documents
✅ Call live functions based on message content
✅ Reply with tone control, memory, and structured output
This is the final version I submitted to the Kaggle x Google GenAI Capstone — and here's how I made it feel more like a product than a prototype.
📺 Prefer watching?
If you’d rather see it in action, I recorded a walkthrough of the full project for the Kaggle submission.
👉 Watch the video on YouTube
🔧 What’s New in This Version
I added capabilities that bring it closer to something you could actually use inside a company:
✅ Prioritization – The agent detects urgency using both keyword rules and Gemini’s tone classification. It flags messages as high, normal, or low — a must-have for triage systems.
✅ Escalation Logic – If the agent isn’t confident or can’t resolve the issue, it sets
needs_human_review = true
and can tell the user a human rep will follow up.✅ Memory – It tracks the customer’s previous message, so replies feel more contextual — like a real conversation.
✅ MLOps: Logging & Evaluation – Each interaction is logged, evaluated by Gemini, and tracked with metadata like tone, token usage, and category.
✅ Resilience & Fallback Behavior – The agent is wrapped in a retry mechanism that handles API rate limits and temporary Gemini failures.
When I hit Gemini quota limits during testing, it automatically retried the call — and if the issue persisted, it generated a polite fallback message and flagged the case for human review.
How the Core Pipeline Works
The smart features above build on a modular GenAI pipeline I layered up step by step.
1. 🗂️ Classification
Every incoming message is classified into a category like refund_request
, payment_issue
, or order_status
.
I started with a TF-IDF + Logistic Regression model and then switched to a zero-shot classifier using Gemini — fast, flexible, and no training required.
Example snippet:
def classify_email(email_text):
prompt = f"""
Classify the following email into one of the following categories:
refund_request, payment_issue, order_status, general_inquiry.
Email:
{email_text}
Return only the category name.
"""
return gemini_model.generate_content(prompt).text.strip()
2. 📚 Embedding & Document Retrieval (RAG)
Support articles are embedded using Gemini's text-embedding-004
. When a customer email comes in, the agent compares it to the vector space and pulls the top 3 relevant documents.
Grounded replies = trusted answers.
3. 🔌 Function Calling with Live Data
If the message mentions an invoice, refund, or delivery, the agent triggers a tool that queries a mock SQL database using typed Python functions.
This bridges the gap between conversation and real business operations.
Example snippet:
@tool
def get_invoice_info(invoice_id: str):
"""Retrieve invoice info from a mock database."""
for row in invoice_db:
if row["invoice_id"] == invoice_id:
return row
return {"error": "Invoice not found"}
4. ✉️ Structured Reply Generation
The agent generates replies like a human — tone-controlled, complete with metadata, and escalation flags when needed.
Example snippet:
final_prompt = f"""
Compose a customer support reply using the following:
- Customer Message: {email_text}
- Extracted Info: {facts}
- Retrieved Answer: {answer}
- Memory Summary: {short_term_context}
Make the reply sound polite and helpful.
If escalation is required, say so explicitly.
"""
🔁 The Final Unified Pipeline
To tie everything together, I created a single function: support_agent_pipeline_gemini()
— it runs the full support logic end to end.
This includes:
🧠 Combining subject + body
🏷️ Classifying the message
⚡ Detecting urgency
🔍 Retrieving documents via Gemini embeddings
🧵 Using memory for continuity
🔧 Calling tools when needed (e.g. invoice lookup)
🗣️ Tone control + few-shot prompting
📤 Returning structured JSON reply
✅ Evaluating + flagging uncertain responses
🗃️ Logging every step to CSV
It’s robust, testable, and ready to plug into a product.
🧩 What I Learned (and What’s Next)
This project showed me how far you can go with GenAI — even without fine-tuning a model.
With the right prompts, some logic, and a modular structure, you can build something that:
Retrieves the right info
Calls real functions (like checking an invoice)
Justifies its answers with traceable context
Knows when to escalate or back off
Logs and evaluates its own performance
I also realized that handling rate limits and quota errors is not optional — it’s a real-world necessity if you’re building production-grade GenAI systems.
It’s not just a chatbot — it’s a teammate.
🔮 What’s Next?
Add a feedback loop where users rate replies (RLHF-style)
Integrate into a real backend with live API calls
Expand memory to support long multi-turn threads
Add multilingual support
🔗 Try It Yourself
💬 Questions? Feedback? Drop me a message on LinkedIn or reply here.
Congratulation on your project making it to the top 10 and wishing you more success. Your blog details the project and your journey through this competition in a very interesting manner.
Congratulations on being the top 10 winners in the Google Gen AI Capstone project. I enjoyed your blog, and video. Hope to read the code.