The Right Model for the Job

By Hope Dorman·Jul 02, 2026·16 min read

In this episode of CX Now, host Lauren Gold, Chief Customer Officer of Kustomer, sits down with co-founder and CTO Jeremy Suriel. The conversation comes at a pivotal moment for AI in customer service: the question is no longer whether to use AI, but how to architect it for the long term. The choices companies make now, like which models to use, how to integrate them, and how much control to hand customers, will determine which platforms keep pace with the industry and which fall behind.

Drawing on her vantage point leading the customer side of the business, Gold explores with Suriel how those technical decisions translate into real outcomes: faster resolutions, smarter automation, and experiences that feel genuinely intelligent rather than scripted. Together, they pull back the curtain on how Kustomer approaches building AI today.

This interview has been lightly edited for clarity.

Lauren: Top of mind for me, from the beginning, is foundation model selection. When you're building AI into a customer service platform, there's always a temptation to pick one model and go deep on it. So how do you actually think about which AI model to use for a specific task?

Jeremy: Great question. So look, it starts with the job. Different tasks have fundamentally different requirements. It might be they need high throughput, low latency — you're rendering a timeline, you want it to show up right away. Reasoning depth — really deep reasoning on data. Cost comes into play. Context windows. So lots of different things come into play.

We're not really picking a single model and trying to force everything through it. We're running a tiered architecture model. Each model is chosen based on what it's best at, and it extends across the platform. Things like language detection might need to be lightweight and fast, so we'll pick certain models for that. Sentiment analysis, translations, our own topic extraction pipeline processing millions of conversations — for that one, for example, we recently moved to a cheaper AWS model after we ran internal evals and showed really competitive quality at a fraction of the cost. So we've actually seen AWS models making real advances on cost and latency without sacrificing quality. We use them for things like classification and pre-processing.

Everybody knows about Anthropic's models — they have more advanced deep reasoning, and so we put them in some of the data-heavy areas of the platform where you need that deeper reasoning. When a better option emerges for a specific task, we often spend time evaluating that. New models come out all the time and we might run and say, hey, this might be good for this area of the product. We run them through evaluations, and many times we can unlock some gains, whether it's lower latency or higher quality. There have been times we've done this and unlocked — I mean, everything was better: higher quality, lower latency, lower costs, and it's just a model change. Some new model came out.

It really comes down to that task. I'll also point out that in our platform, we have AI features across the product, and we also have an agentic platform where our customers build agents to do conversational automation, as an example. We support the ability to run each of those agents on either an AWS model, an Anthropic model, or an OpenAI model, and customers can decide which one's best based on what they're trying to do — they can optimize it for their use case. That's sort of how we think about it.

Lauren: That makes sense. Has your approach here evolved as the landscape has changed? I think you touched on it — stay nimble, run the evals, consider all these different models that come up.

Jeremy: Yeah, absolutely. In the earlier days, we certainly thought, okay, we could just choose one model and it could do everything — let's just choose an OpenAI model and run everything through it. Today, each provider brings something unique to the table. So it certainly makes a lot of sense to do this sort of ensemble approach — pick the right model for the job.

Lauren: That makes sense. Beyond the model, there's the orchestration layer, as we refer to it at least at Kustomer. A lot of AI platforms are built around a single model provider. So what's the architectural argument for building something more flexible, and what does that complexity cost you on the engineering side?

Jeremy: Look, for me it's like — the model landscape is moving so fast that to bet on a single horse, if you build your platform tightly coupled to a single provider, or you don't allow your customers to make a choice within the agentic platform, you're maybe a pricing change or some deprecation or capability gap away from having issues.

So we built an abstraction layer that lets us swap in models at the task level, at the job level. And yes, that has real engineering costs — you need to normalize the inputs and outputs across providers, handle different token economics, manage different failure modes, whatever it might be. But the alternative cost, I believe, is much higher. If we had to rebuild a whole integration against the clock, that's not going to be a great thing. And these providers progress, call it every three to six months — you see major changes. So my belief is the complexity is worth it, and I think it compounds in your favor.

Every new model that launches, our customers get access to it without rebuilding anything. We deploy some of these models across regions — we take advantage of global inferencing, global region failover, and we can continually improve degradation paths. So we're doing some really cool things with that. Our customers are getting the value right away. Yes, there's a cost to doing that, but I think the cost is worth it.

Lauren: Absolutely — what's the cost of not trying is also a scary thought. So you're talking a lot about testing new things. I'm thinking about how you stay current. Things are moving so fast, as you mentioned — the frontier models are improving dramatically every few months. So how are you managing that as a product and engineering team? How do you determine what models are worth testing? How do you validate them against production behavior? How do you even consider adding a new model to test at this point?

Jeremy: So look, we have a systematic evaluation process. When a new model drops, we run it against our production evals for a particular set of features. You literally run them and say, I want to compare these four models. You might be thinking, this one's pretty good, lower cost; this one might have a little bit better output. Let's run it through our evals 10, 20, however many times, to get the confidence level you need. And you can measure it across accuracy, latency, cost, and consistency. That's been really helpful — to be able to do that on day zero when a model drops, and also over time as we're building new features, we run through that process as well.

A model might score great on general reasoning but hallucinate on some domains. So we test against a lot of customer scenarios — returns processing, escalation decisions, knowledge retrieval accuracy — to make sure we're testing against the right data points.

In terms of rollout, it really depends on the feature. It's gradual for generally available features — we start with internal testing, we have an opt-in early access program in some cases, and then it rolls out to broader availability across our platform. In addition to that internal eval suite, the orchestration layer can be controlled by our clients — they can assign their agents within the platform to a new model, run evaluations within the product, and roll out to a subset of their users, or even an individual user if they want. Customers who value more stability can stay on their existing agent configurations until they're ready. So everyone's happy.

Lauren: I'm glad you started speaking in that direction — I was thinking about our customers and how they engage with these models too. If you give customers meaningful control over how AI is configured in their environment, how do you prevent that from being paralyzing and too daunting? Everyone's on their own personal AI transformation journey. Where do you step in and make decisions for them, and where do you hand them the wheel?

Jeremy: The way I like to think about it, very simply, is smart defaults with escape hatches. Most customers want to keep things simple — they want their AI to just work well. So we try to set really intelligent defaults that cover the majority of our customers, and we're spending a lot of time improving that and handling more complex use cases within that simpler experience. We're seeing that as we continue to expand on those capabilities.

We also have AI assistants in the product that can configure lower-level details for clients, including going so far as to write code to build tools that execute deterministically within an automation — you don't need to know how to code, it handles all that for you. So there are simple, intelligent defaults that can handle a lot of complexity, and there are assistants to help you do whatever it is you need to do. They can even analyze automations across our agentic platform and deterministic automations and make suggestions and recommendations — like, you have a duplicate here, you might want to do this instead of that.

For some of the larger customers running more complex operations at scale, they typically want more control — they might have specific latency requirements or compliance constraints. For those customers, we believe in exposing more advanced configuration and control capabilities. So we have simple experiences and we have more advanced experiences, and we avoid forcing customers into complicated experiences without any assistance.

Lauren: That's amazing — there are lots of different ways for our customers to engage, depending on how much authority or liberty they want to take on versus wanting recommendations made for them. I've spent a lot of time in market, as you know — I love being out and talking with our customers and other CX leaders in this space, listening and learning alongside them. No surprise, there's a real debate right now about whether CX platforms should be training their own models versus relying on foundation models from the big labs — the age-old build-versus-buy question. Where do you come down on that, and where do teams get it wrong when they think about build versus buy?

Jeremy: So I think what you're seeing more of isn't really training models from scratch — it's more like fine-tuning open source models. They might call it proprietary AI, and maybe that's something people should be aware of. There's a lot of fine-tuning happening. My belief is that these frontier labs — Anthropic, OpenAI, Google — are spending billions of dollars annually on model R&D with thousands of researchers. Every few months, the next generation of frontier model leapfrogs whatever you might have fine-tuned six months ago. So you're on a treadmill — you fine-tune, the frontier moves, your proprietary model falls behind, and you have to do it again. Meanwhile, folks like us are just using those frontier models and we're already ahead, without having to invest that time and resources.

This isn't theoretical, by the way — we've seen it play out across domains. Frontier models have routinely outperformed domain-specific fine-tuned models over the past few years. You've seen it across medicine, law, coding, finance — general-purpose frontier models beat specialized fine-tuned systems, often by wide margins, at zero additional training cost. So the pattern's consistent. By the time you've probably completed your fine-tuning pipeline, a new frontier model is either out or about to be released, and it's going to leapfrog your work.

So look, I think fine-tuning has its place for specific, narrow tasks where you need precise behavior at a lower cost and you have the data to support it. But as a core strategy, you're investing engineering resources into maintaining a model that will always be chasing the frontier. Those resources could go into building better workflows, better integrations, better data pipelines — things the labs can't replicate because they don't understand your domain. For me, the moat isn't the model. The model is a commodity. What you build around it — that's the product. That's the thing we want to put the most time into.

Lauren: Well said — the moat is not the model. That really helps clarify it. Data privacy is a hot topic around all of this, for sure. Customer interaction data is sensitive, and when you're routing this data through third-party AI providers to power your product, how do you think about privacy obligations? I know this is something we take really seriously — how do we maintain customer trust throughout?

Jeremy: Yeah, this is a non-negotiable thing. Customers rightly demand clarity on it. Here's how we think about it. First, we have zero data retention agreements across every model provider we work with. Customer interaction data goes in, responses come out, and nothing is stored or used for training — period, done. We have this contractually with every provider we work with.

Second, data residency matters. Enterprise customers sometimes have very specific requirements about where their data can be processed. Our multi-provider model, with the ability to support global or regional inferencing, actually helps. We can route specific customers to specific providers in specific regions based on their compliance requirements.

Third, transparency. Customers should know which provider is processing their data within the agentic platform — there's no black box. If a customer says we can't use a certain provider for compliance reasons, we're able to handle that. So this multi-model approach makes privacy easier for us — we're not locked into a single provider's data practices. We have options.

Lauren: That's, like you said, a non-negotiable, absolutely. I hear that from our customers as well. Your biggest AI partners — OpenAI, Google, Anthropic — are all companies with the resources and ambition to build directly into enterprise software. How do you think about that dynamic? Does it keep you up at night, or is it just the reality of building in this space right now?

Jeremy: I wouldn't say it keeps me up at night. I think it's definitely the reality of building in this space right now. Things are moving extremely fast. As a leader, you have to think about what things might look like six, twelve, eighteen months from now, and what you might need to do to survive or ultimately thrive. I definitely think a lot about that stuff. That's the kind of stuff that might keep you up at night.

For example, what I hear from time to time — from prospects, or various venture and startup founder podcasts — is whether individual organizations will build their own tailored version of a customer support solution or agentic platform or other SaaS product they use every day. With tools like Claude Code, you could absolutely open it up, create some skills and subagents, and build a demo that looks impressive. That part's getting easier every day. But going from a demo to an enterprise-grade, highly available production application is completely different.

Many times, folks aren't factoring in things like multi-region deployments, fault tolerance, low latency, real-time channel integrations across voice, email, and social. They haven't factored in domain expertise from shadowing hundreds of agents in a contact center, or the productivity features human agents need. We still have a lot of human agents, and the performance requirements they have. That's the biggest thing that keeps me up — people think they can use these tools and immediately replace some things, and that's probably what I'd think about most.

The model, the reasoning — as I mentioned, I think that's becoming commoditized. The platform underneath it is what customers really need to trust and rely on every single day. That's not something you can spin up in a weekend with vibe coding, with Claude Code. That's something you're going to need to trust a provider for. That's probably where most of my thinking is.

Lauren: Okay, last question — this has been so informative, thank you so much, I've learned multiple things from you every chance I get. Let's talk about the in-house AI team. Given how much capability is available off the shelf from these frontier labs, what do you think is the actual job of an in-house AI team? How is it evolving inside a B2B SaaS company right now? How do you recruit for it, and how do you keep great talent engaged? Tell me more.

Jeremy: So look, in my opinion, it's certainly not to compete with the frontier labs — to kick that off. I think there are three things. There's the integration side — taking the raw capability of a frontier model and deeply embedding it into your product, your data model, your workflows. That's hard, domain-specific stuff that others really can't do for you.

Then there's orchestration — building the eval frameworks, the routing logic, the fallback systems, the monitoring that makes this multi-model deployment reliable in production. I think that's really important.

And then domain expertise — they need to understand CX deeply enough to know what a model needs to do for certain tasks, and what it can get away with being mediocre at. They need to work closely with product leaders to understand the art of the possible. They need to bridge the gap between research and production — designing, developing, and deploying practical AI applications to solve real-world business problems. That's really where we need them spending their time.

How do you keep them engaged? You let them work on the hardest problems — the hardest applied problems. Not "how do we train a better model" — that's not where we'd focus. It's "can we make the best models in the world work reliably at scale for what we're trying to do? Can we unlock real value for our customers and the business?" That's genuinely interesting engineering work. Those are the people who are going to thrive here.

Lauren: Well, amazing, Jeremy. Thank you so much for your time. We're certainly grateful to have you leading the charge for us here at Kustomer, and I'm sure our viewers learned quite a bit today. Thank you so much.

Share: