Oura unlocks VoC insights traditional CSAT missed

How Oura built a custom Experience Score powered by LLMs with MaestroQA—one that works across both human and bot interactions.

Industry
Health Tech
Use Case
Company Size
900+ employees

Increase in CSAT

30%

Decrease in agent ramp time

120%

Increase in monthly coaching sessions

300%

Company Overview

Oura is the health tech company behind the iconic smart ring that helps millions track sleep, recovery, and readiness. With a global customer base and a fast-growing support team powered by both humans and bots, Oura needed a better way to measure experience—and to trust it.

The Challenge

When Oura launched Decagon, an AI chatbot that quickly began handling 60% of support contacts, the results were clear: performance was strong, savings were real, and the C-suite took notice.

But that win created a problem. Chatbot quality was being reported by the vendor, not independently verified. And using human QA teams to randomly audit bot conversations didn’t make sense. Worse, the team was still tethered to outdated metrics like CSAT and audit scorecards—neither of which reflected the full customer experience.

“The old QA program looked painful. It was this monotonous process of validating performance just to meet contract terms. It didn’t give us real insight—and no one trusted the scores anyway.

The Turning Point

As Oura expanded its use of automation, they realized they needed a consistent way to evaluate every type of contact, whether it came from a bot or a human. But trying to replicate a traditional QA scorecard with an LLM didn’t work.

“Our QAs can navigate deeply through complex systems and use intuition. That’s not something an LLM can do well,” Skyler explained. “We needed to stop trying to force AI into human roles—and vice versa.”

Instead, the team flipped their approach: rather than mimicking how humans audit, they defined what actually matters to customers and built a new metric from the ground up.

The Solution

A Custom Experience Score

Using MaestroQA’s platform, Oura built a set of custom LLM classifiers designed to measure quality from the customer’s point of view. After testing over 14 classifiers (and iterating some a number of times), they distilled the model down to four critical dimensions that determine whether a conversation was good or bad.

The result: a custom Experience Score that applies to every conversation, across both bots and humans.

“We went from just randomly checking tickets to scoring all conversations with a model we trust,” We’re now evaluating 100% of interactions—not just the 8% that get a CSAT response.” - Skyler

Each classifier gives a binary output: yes, no, or not enough info. If all four are marked “yes,” the conversation counts as a success. Anything less? It doesn’t.

Impact

CSAT Replacement

CSAT wasn’t providing enough signal and response rates were dropping. The Experience Score now gives the team a reliable, real-time metric that’s already being used to phase CSAT out.

C-Suite Visibility

Unlike CSAT, the Experience Score is built on clearly defined, well-understood logic—making it defensible to executives and adaptable across teams.

The Experience Score is already being tied to actual business outcomes—like referral usage, subscription continuation, and repurchase behavior.

“We know exactly how each classifier is built, why the wording matters, and what each one is measuring. That’s not true with out-of-the-box metrics.” - Skyler

Smarter Strategy

Freed from audit work, Oura’s QA team is now focused on high-leverage, systemic issue detection. Instead of random grading, QAs use classifier data to detect “smoke,” then dive deep where there’s fire—making meaningful changes to workflows, chat logic, and support policies.

Why They Didn’t Choose an Out-of-the-Box Score

Oura considered prebuilt models but found them too rigid and opaque.

“Even with a product as simple as ours—a ring—it was still too reductive. We don’t know what’s behind those metrics. We needed nuance.” - Skyler

With MaestroQA, Oura had the flexibility to build and iterate on their own logic, train their own classifiers, and structure their metric around what quality means for them.

What’s Next

Oura is continuing to roll out their Experience Score across all interactions, with plans to fully retire CSAT as the company’s primary customer experience metric. The team is also developing new LLM classifiers to track product-related friction points—enabling faster escalations to engineering and operations. As the new metrics gain traction internally, Oura will begin updating BPO contracts and KPIs to reflect this more accurate view of quality and voice of customer.

See how MaestroQA can help you unlock the power of conversation data

Get in touch to learn how MaestroQA helps you:

  • Uncover business-critical issues hidden in support conversations
  • Replace disconnected metrics with insights that drive action
  • Equip teams across product, ops, and support with data that matters
  • Connect the dots between customer experience and company outcomes

Fill out the form below

Oura unlocks VoC insights traditional CSAT missed

Watch the Webinar

Industry
Health Tech
Use Case
Company Size
900+ employees

Increase in CSAT

30%

Decrease in agent ramp time

120%

Increase in monthly coaching sessions

300%

Company Overview

Oura is the health tech company behind the iconic smart ring that helps millions track sleep, recovery, and readiness. With a global customer base and a fast-growing support team powered by both humans and bots, Oura needed a better way to measure experience—and to trust it.

The Challenge

When Oura launched Decagon, an AI chatbot that quickly began handling 60% of support contacts, the results were clear: performance was strong, savings were real, and the C-suite took notice.

But that win created a problem. Chatbot quality was being reported by the vendor, not independently verified. And using human QA teams to randomly audit bot conversations didn’t make sense. Worse, the team was still tethered to outdated metrics like CSAT and audit scorecards—neither of which reflected the full customer experience.

“The old QA program looked painful. It was this monotonous process of validating performance just to meet contract terms. It didn’t give us real insight—and no one trusted the scores anyway.

The Turning Point

As Oura expanded its use of automation, they realized they needed a consistent way to evaluate every type of contact, whether it came from a bot or a human. But trying to replicate a traditional QA scorecard with an LLM didn’t work.

“Our QAs can navigate deeply through complex systems and use intuition. That’s not something an LLM can do well,” Skyler explained. “We needed to stop trying to force AI into human roles—and vice versa.”

Instead, the team flipped their approach: rather than mimicking how humans audit, they defined what actually matters to customers and built a new metric from the ground up.

The Solution

A Custom Experience Score

Using MaestroQA’s platform, Oura built a set of custom LLM classifiers designed to measure quality from the customer’s point of view. After testing over 14 classifiers (and iterating some a number of times), they distilled the model down to four critical dimensions that determine whether a conversation was good or bad.

The result: a custom Experience Score that applies to every conversation, across both bots and humans.

“We went from just randomly checking tickets to scoring all conversations with a model we trust,” We’re now evaluating 100% of interactions—not just the 8% that get a CSAT response.” - Skyler

Each classifier gives a binary output: yes, no, or not enough info. If all four are marked “yes,” the conversation counts as a success. Anything less? It doesn’t.

Impact

CSAT Replacement

CSAT wasn’t providing enough signal and response rates were dropping. The Experience Score now gives the team a reliable, real-time metric that’s already being used to phase CSAT out.

C-Suite Visibility

Unlike CSAT, the Experience Score is built on clearly defined, well-understood logic—making it defensible to executives and adaptable across teams.

The Experience Score is already being tied to actual business outcomes—like referral usage, subscription continuation, and repurchase behavior.

“We know exactly how each classifier is built, why the wording matters, and what each one is measuring. That’s not true with out-of-the-box metrics.” - Skyler

Smarter Strategy

Freed from audit work, Oura’s QA team is now focused on high-leverage, systemic issue detection. Instead of random grading, QAs use classifier data to detect “smoke,” then dive deep where there’s fire—making meaningful changes to workflows, chat logic, and support policies.

Why They Didn’t Choose an Out-of-the-Box Score

Oura considered prebuilt models but found them too rigid and opaque.

“Even with a product as simple as ours—a ring—it was still too reductive. We don’t know what’s behind those metrics. We needed nuance.” - Skyler

With MaestroQA, Oura had the flexibility to build and iterate on their own logic, train their own classifiers, and structure their metric around what quality means for them.

What’s Next

Oura is continuing to roll out their Experience Score across all interactions, with plans to fully retire CSAT as the company’s primary customer experience metric. The team is also developing new LLM classifiers to track product-related friction points—enabling faster escalations to engineering and operations. As the new metrics gain traction internally, Oura will begin updating BPO contracts and KPIs to reflect this more accurate view of quality and voice of customer.

Webinar

QA that Drives Real Impact: How Chegg Delivers Value with QA and Insights

Chegg’s Support Insights and QA teams have transformed how they capture and act on customer feedback. Learn how they leverage AI-powered analysis and targeted QA to surface critical insights, influence product decisions, and improve student experiences. Discover how structured Voice of the Customer (VoC) programs can drive real business impact—reducing churn, enhancing product usability, and aligning support with company strategy.

Webinar

Perfecting Chatbot Performance

Chatbots are often the first point of contact with your customers, and their performance can make or break the customer experience—don’t leave their performance to chance. Join Natalie Langdale, the driving force behind Betterment’s chatbot strategy, in a webinar that’s all about maximizing your chatbot’s effectiveness with MaestroQA.

Webinar

Quality & Risk Operational Excellence in CX

Discover how TaskRabbit transformed its approach to risk management, leveraging data-driven insights to enhance safety protocols, optimize training, and drive continuous improvement. Hear firsthand from TaskRabbit's team as they share their challenges and strategies. Learn how MaestroQA's tools can revolutionize your risk management practices!

Webinar

Next-Level Agent Performance

Discover the unique blend of empathy and efficiency that characterizes Getaround's modern coaching model. Learn how they use data-driven coaching conversations to strengthen the core relationship between team leads and frontline agents, reinforcing their core values and enabling them to adapt quickly to dynamic customer needs.