Founders Series

The Battle for Higher Quality Customer Service: ChatGPT vs QA Analyst

Explore whether ChatGPT can match the objectivity and insight of human QA analysts in the quest to automate quality assurance in customer service.

As seen on
Trusted by 100K+ team members around the globe

Can QA Be Automated with ChatGPT-Type Technology?

Manual QA programs were invented to solve the Metrics Trust Gap. With the rise of AI-powered tools like ChatGPT, there’s growing interest in automating QA processes. But can machine learning truly replicate the objectivity and sensibility of human QA analysts? To answer this, we conducted an experiment comparing ChatGPT’s ability to grade customer service tickets with that of a human QA analyst.

Poll #1
Are people in your company asking if QA can be automated w/ ChatGPT-type technology?
Poll #2
What % of QA do you think can be automated w/ ChatGPT-type technology?

Experiment Design

ChatGPT, an AI-powered tool, has the capability to automate parts of the QA process, but can it match the objectivity and sensibility of human QA analysts?

In our experiment, we aimed to investigate this question by tasking ChatGPT with grading 200 tickets based on a single question: "Did the agent demonstrate active listening?" We then compared ChatGPT's results with those of a human QA analyst.

Why Active Listening?

Active listening was chosen as the primary focus due to the following reasons:

It is fair and applicable across various companies.

It doesn't require knowledge of internal systems or company-specific data.

GPT does not have access to these systems, and thus cannot grade based on them.

Active listening remains a complex and intriguing criterion.

Phase 1: Data Collection and Cleaning

Preparing and cleaning the data is essential for accurate results in testing, as the quality of the underlying data cannot be compromised and requires proper labeling, formatting, and anonymization.

Our data collection involved 5 key parts:

1

Assembled a small group of test customers

2

Selected one Yes/No question from each Scorecard that had at least 200 scores

3

Focused on chat and email conversations

4

Anonymized and removed any sensitive information

5

Cleaned the data to exclude "internal" notes, chatbots, and other non-customer-agent messages

Phase 2: Example Prompts

GPT, as a text prediction tool, relies on high-quality prompts to generate accurate answers.

Our prompt consisted of four parts:

1

Generic context

2

Question context

3

Specific examples of good and bad active listening

4

The conversation to grade

Objective of Experiment

Determine if the agent displayed active listening skills, allowing it to offer personalized recommendations and identify opportunities for customer success.

Challenges in Prompt Engineering

During the experiment, we encountered several challenges, such as the limitations of prompt size and the risk of over-specification, which could lead to errors or hallucinations by the AI.

Phase 3: Experiment Results

The study assessed GPT's performance in answering 200 questions, comparing it to a human grader.

10 iterations were conducted to enhance the prompts following initial testing.

The final results revealed a 58% alignment between GPT and the human grader.

Unfortunately, this level of alignment did not meet the expectations of pilot customers to integrate it into their QA program.

Chapter 4: Exploring Future Possibilities

Potential Follow-Up Ideas

The results do not imply that GPT cannot work. There is still a lot to explore:

1

Further prompting tests needed for GPT, but complexity can lead to errors.

2

Third-party vendor customization necessary for high-quality automated QA, effectiveness of ChatGPT uncertain.

3

Johnny Appleseed may need more resources for AutoQA, GPT-based approach effectiveness unknown.

4

Optimal results may require a combination of narrow models and expert systems.

5

Data cleaning is crucial for accuracy and security.

Conclusion: The Ongoing Battle Between AI and Human QA

While ChatGPT shows potential in automating QA processes, the current technology falls short of replacing human analysts. As AI continues to evolve, businesses will need to explore new strategies and tools to meet the demands of high-quality customer service.

If you're interested in learning more about ChatGPT and its potential for automating QA processes, request a demo today. Stay updated on the latest trends in customer service technology by signing up for our CEO Series.

Webinar

GPT vs. QA Analyst

Walk through MaestroQA Labs Experiment with CTO/CEO

In this chat, Vasu Prathipati and Harrison Hunter, the CEO and CTO of MaestroQA, respectively, will present an experiment they conducted to compare the Auto QA capabilities of ChatGPT against those of a QA Analyst.