Companies that rely on analyzing high volumes of data face a core dilemma: how to deliver real-time insights without burning through budget or engineering resources. Convirza, a leader in call analytics, recently faced this exact challenge and found an answer with Predibase’s multi-LoRA serving infrastructure.
Join us on November 21st at 10:00 am PT for an exclusive look into how Convirza transitioned from Longformer models to fine-tuned SLMs, improving speed and accuracy while cutting costs.
Here’s a snapshot of how they did it:
-
Scalable Multi-LoRA Deployment: With Predibase’s LoRA eXchange, they consolidated 60 LoRA adapters into a single base model deployment that can quickly scale between zero and 10 A100 GPUs, creating a system that automatically meets spikes in traffic.
-
Sub 2-Second Responses for End-Customers: Convirza’s average response time is now under two seconds, even during peak traffic, ensuring agents and end-users experience minimal delay in critical insights.
-
Optimized Cost Structure: Instead of dedicated GPUs for each of the 60 indicators—each costing $500-$1,500 per month—Convirza now runs multiple indicators on a single scalable GPU deployment. Monthly expenses plummeted without sacrificing accuracy.
-
Faster Iterations with SLMs: They swapped out Longformer models for fine-tuned SLMs, reducing the training cycles from 9-24+ hours to less than 3 hours on average per adapter.
Convirza’s team went from costly analytics with long training times and limited accuracy to scalable insights with measurable quality improvements in a matter of weeks. Join us to hear firsthand from Giuseppe Romagnuolo, Convirza’s VP of AI, about how they leveraged Predibase to optimize their infrastructure and sharpen the analytics that power their business.