The Battle for Higher Quality Customer Service: Chat GPT vs QA Analyst
Can machine learning match human objectivity and sensibility of QA in customer service?
Can QA be automated with ChatGPT-type tech?
Manual QA programs were invented to solve the Metrics Trust Gap.
Are people in your company asking this question?
What % of QA do you think can be automated w/ ChatGPT-type technology?
ChatGPT, an AI-powered tool, has the capability to automate parts of the QA process, but can it match the objectivity and sensibility of human QA analysts?
In our experiment, we aimed to investigate this question. We tasked ChatGPT with grading 200 tickets based on a single question: "Did the agent demonstrate active listening?" We then compared ChatGPT's results with those of a human QA analyst.
Active listening was chosen as the primary focus due to the following reasons:
It is fair and applicable across various companies.
It doesn't require knowledge of internal systems or company-specific data.
GPT does not have access to these systems, and thus cannot grade based on them.
Active listening remains a complex and intriguing criterion.
Data collection and cleaning
Preparing and cleaning the data is essential for accurate results in testing, as the quality of the underlying data cannot be compromised and requires proper labeling, formatting, and anonymization.
Our data collection involved 5 key parts:
GPT, as a text prediction tool, relies on high-quality prompts to generate accurate answers.
Our prompt consisted of four parts:
We conducted experiments using different prompts to improve our word prediction system. However, prompt size had limitations, and we had to be careful about compliance and the quality of examples we used.
Our main objective was to determine if the agent displayed active listening skills, allowing it to offer personalized recommendations and identify opportunities for customer success.
The study assessed GPT's performance in answering 200 questions, comparing it to a human grader.
10 iterations were conducted to enhance the prompts following initial testing.
The final results revealed a 58% alignment between GPT and the human grader.
Unfortunately, this level of alignment did not meet the expectations of pilot customers to integrate it into their QA program.
Potential Follow Up Ideas
The results don’t mean GPT can’t work. There is still a lot to explore:
If you're interested in learning more about ChatGPT and its potential for automating QA processes, request a demo today. You can also sign up for our CEO Series to stay updated on the latest trends in customer service technology.