Gen AI for Opera: Democratizing AI Model Creation
No-code custom AI model development
Vision
Empower users to build AI models independently, reducing internal team dependency and accelerating innovation.
Problem to be solved:
Enable customers to build language models without technical expertise.
Business need:
Expand self-service capabilities for customers and reduce dependency on internal teams.
My role:
End-to-end design leadership from concept to launch.
Outcome:
Beta launch attracted two new customers.
Learning:
Strategic hands-on guidance proved essential for successful adoption.
What is Opera?
Opera is Cresta's no-code platform to enable contact center leaders to automate coaching and monitor compliance without AI expertise.
Users can create custom rules in "if this, then that" fashion for simple scenarios, complementing Cresta's AI models that handle complex interactions.
For example, if customer inquires about upgrade, agents should receive a reminder on the temporary promotion the company is running for the month of July. Then as a Cresta Opera user, I can go to Opera and create this rule.
Problem 1: Low accuracy
Opera mainly used keyword builder to define triggers at the time. Keywords were easy to use at first but eventually proved limited in practice.
It was difficult to describe the triggering moments holistically with just a few words.
Problem 2: Long wait
When users needed higher quality AI models, they were dependent on Cresta's AI Delivery team. However, the backlog of custom developments and changes created significant delays, leaving users waiting extended periods for improvements.
But how does the LLM help anyway?
Using Generative AI, users can make customizations without a long wait or accuracy problems!
So what is the general flow?
Here are the three phases users will go through to define the trigger using generative AI. These don't necessarily mean that the entire flow should be limited to mere three steps (although it did in the first iteration 🙃)
First iteration
In step 1, user defines the the moment they'd like to capture. In step 2, user marks whether the phrases are true or false examples of the moment they'd like to define. Then lastly in step 3, user reviews the model predictions.
Two main feedback on the first iteration:
Too many technical jargons
Simplicity causing uncertainty
1. Too many technical jargons
Some of the words we used so naturally internally were very foreign to the end users. Things didn't feel friendly and users felt "unqualified" to use the tool.
2. Simplicity causing uncertainty
Condensing the entire process into just three steps felt reductive to both end-users and the ML team. Users doubted whether such simplification could maintain quality standards, while ML Engineers and Researchers wanted to cultivate deeper understanding rather than oversimplifying.
Final Design
Solution 1: No need to speak the "AI lingo"
In the final design, we tried to keep the language and tone as natural as possible. No more "positive or negative examples".
Solution 2: Longer but more robust model building flow
One big learning from the first iteration feedback was that our users preferred spending more time and fine-tuning the model. It perhaps became clearer to the users as well that this was something that had to be more than in the blink of an eye.
This revelation brought a lot of changes to how users to go through the clarify phase of modeling. One change was showing a tiered rounds of examples, instead of one flat list.
In the first round of examples, we will show the examples that the AI model is more confident about. These would be more obvious ones, the easy ones. The goal in this round is to get users' confirmation.
Then, in the second and third rounds, we'll have more nuanced examples where the model is less confident about. Users' marks on these ones will teach the model how to interpret tricky scenarios.
This is an unlikely scenario. But for some reason, if users make a lot of corrections at the last step, it is evident that the model will not perform well when deployed. We encourage users to return and grade more examples to avoid this outcome.