- Uncover business-critical issues hidden in support conversations
- Replace disconnected metrics with insights that drive action
- Equip teams across product, ops, and support with data that matters
- Connect the dots between customer experience and company outcomes
Increase in CSAT
30%
Decrease in agent ramp time
120%
Increase in monthly coaching sessions
300%
Company Overview
Oura is the health tech company behind the iconic smart ring that helps millions track sleep, recovery, and readiness. With a global customer base and a fast-growing support team powered by both humans and bots, Oura needed a better way to measure experience—and to trust it.
The Challenge
When Oura launched Decagon, an AI chatbot that quickly began handling 60% of support contacts, the results were clear: performance was strong, savings were real, and the C-suite took notice.
But that win created a problem. Chatbot quality was being reported by the vendor, not independently verified. And using human QA teams to randomly audit bot conversations didn’t make sense. Worse, the team was still tethered to outdated metrics like CSAT and audit scorecards—neither of which reflected the full customer experience.
“The old QA program looked painful. It was this monotonous process of validating performance just to meet contract terms. It didn’t give us real insight—and no one trusted the scores anyway.
The Turning Point
As Oura expanded its use of automation, they realized they needed a consistent way to evaluate every type of contact, whether it came from a bot or a human. But trying to replicate a traditional QA scorecard with an LLM didn’t work.
“Our QAs can navigate deeply through complex systems and use intuition. That’s not something an LLM can do well,” Skyler explained. “We needed to stop trying to force AI into human roles—and vice versa.”
Instead, the team flipped their approach: rather than mimicking how humans audit, they defined what actually matters to customers and built a new metric from the ground up.
The Solution
A Custom Experience Score
Using MaestroQA’s platform, Oura built a set of custom LLM classifiers designed to measure quality from the customer’s point of view. After testing over 14 classifiers (and iterating some a number of times), they distilled the model down to four critical dimensions that determine whether a conversation was good or bad.
The result: a custom Experience Score that applies to every conversation, across both bots and humans.
“We went from just randomly checking tickets to scoring all conversations with a model we trust,” We’re now evaluating 100% of interactions—not just the 8% that get a CSAT response.” - Skyler
Each classifier gives a binary output: yes, no, or not enough info. If all four are marked “yes,” the conversation counts as a success. Anything less? It doesn’t.
Impact
CSAT Replacement
CSAT wasn’t providing enough signal and response rates were dropping. The Experience Score now gives the team a reliable, real-time metric that’s already being used to phase CSAT out.
C-Suite Visibility
Unlike CSAT, the Experience Score is built on clearly defined, well-understood logic—making it defensible to executives and adaptable across teams.
The Experience Score is already being tied to actual business outcomes—like referral usage, subscription continuation, and repurchase behavior.
“We know exactly how each classifier is built, why the wording matters, and what each one is measuring. That’s not true with out-of-the-box metrics.” - Skyler
Smarter Strategy
Freed from audit work, Oura’s QA team is now focused on high-leverage, systemic issue detection. Instead of random grading, QAs use classifier data to detect “smoke,” then dive deep where there’s fire—making meaningful changes to workflows, chat logic, and support policies.
Why They Didn’t Choose an Out-of-the-Box Score
Oura considered prebuilt models but found them too rigid and opaque.
“Even with a product as simple as ours—a ring—it was still too reductive. We don’t know what’s behind those metrics. We needed nuance.” - Skyler
With MaestroQA, Oura had the flexibility to build and iterate on their own logic, train their own classifiers, and structure their metric around what quality means for them.
What’s Next
Oura is continuing to roll out their Experience Score across all interactions, with plans to fully retire CSAT as the company’s primary customer experience metric. The team is also developing new LLM classifiers to track product-related friction points—enabling faster escalations to engineering and operations. As the new metrics gain traction internally, Oura will begin updating BPO contracts and KPIs to reflect this more accurate view of quality and voice of customer.