
Shopify has developed an innovative proxy system for its large language models (LLMs) that grants every engineer seamless access to multiple AI providers. This sophisticated infrastructure includes an automatic failover mechanism, ensuring uninterrupted service even if a specific AI model is discontinued, modified, or becomes unavailable. For instance, when the Claude Fable 5 model was retired, Shopify’s engineering team experienced no disruption; the proxy automatically redirected their workflows to alternative models like Claude Opus or GPT 5.5.
“Fable was an exceptional model, and naturally, we utilized it,” stated Farhan Thawar, Shopify’s Head of Engineering, in a recent interview on the VentureBeat Beyond the Pilot podcast. “When a model is phased out or undergoes a significant update, our proxy allows us to distribute usage across various providers seamlessly.”
Thawar explained that Shopify procures tokens in bulk, with all user access to AI models routed through this central proxy. This system provides valuable insights through reporting and ensures high availability. In cases of provider downtime or performance issues, users are “automatically and seamlessly” switched to a different model.
He emphasized that enterprises can draw valuable lessons from this approach, particularly regarding the potential impact of AI model disruptions on their operations. At a minimum, establishing a robust backup strategy is crucial. A system that facilitates fluid transitions between different models is essential to avoid over-reliance on any single provider.
Another key strategy Shopify employs is model distillation. This technique involves training a smaller “student” model to learn from a larger “teacher” model, often specializing it for narrower, more specific tasks. These smaller language models (SLMs) can offer significant advantages over general-purpose, off-the-shelf models in certain scenarios. For example, Shopify’s flagship AI assistant, Sidekick, leverages specialized SLMs to automate repetitive tasks for merchants, thereby reducing operational friction.
Thawar noted that utilizing these smaller, distilled models can result in substantial cost and latency reductions, sometimes achieving performance gains of up to 30x compared to more generalized models. “It’s not solely about cost and latency, although those are major factors; it’s also critically about accuracy,” he added.
Shopify’s internal platform streamlines the distillation process. Engineers provide the system with a teacher model, training data, evaluation metrics, and a target model (e.g., distilling Opus 4.8 down to Qwen 3.5). This pipeline typically runs for about a day, producing an evaluation report detailing the fine-tuned model’s performance in terms of speed, cost, and accuracy for the specific subtask. If the results meet the desired trade-offs, engineers can deploy the distilled model without a lengthy approval process. Shopify’s internal platform, Tangle, offers a visual representation of this pipeline’s execution.
Thawar envisions a future where the distillation pipeline becomes even more autonomous. “My dream is to eventually not give the distillation pipeline a target model at all,” he shared. “Instead, users could provide the teacher model with data and evals, along with the directive: ‘Based on your learnings over time, I want you to look at a different class of model, different sizes, different types, and you tell me what the right distillation target is.’”
He anticipates potential surprises: “Maybe we’ll discover an exceptionally small model capable of running directly on a mobile device. Other times, the system might conclude that no further distillation can improve upon the current state-of-the-art model for that specific task.”
Shifting Focus from “AI Reflexivity” to “AI Leverage”
Shopify empowers its users to select their preferred AI tools, including options like Claude Code, Codex, Cursor, and GitHub Copilot for VS Code. “We provide access to various frameworks, enabling our teams to experiment and identify what best fits their specific workflows.”
To manage resource utilization effectively, the company has implemented a comprehensive usage dashboard. This tool allows Thawar’s team to analyze not only token expenditure but also patterns such as which users are employing the most expensive tokens, the time spent on complex reasoning tasks, and the types of models and disciplines being utilized across different teams.
Regarding potential “token-maxxing” (excessive token consumption), Shopify has integrated “circuit breakers.” If a model runs for an extended period, such as 10 hours, and accumulates significant token usage, the user receives a notification prompting them to confirm the ongoing usage. Thawar noted that responses vary; some users intentionally continue their sessions, acknowledging the high usage, while others express surprise, indicating they were unaware of the process running in the background and choose to terminate it.
Thawar articulated Shopify’s overarching objective: to transition from a reactive approach, termed “AI reflexivity,” to a more strategic mindset of “AI leverage.” This shift encourages individuals to critically assess and identify the areas within their workflows where AI can deliver the most substantial benefits.
The full podcast episode delves into several key areas, including:
-
Shopify’s core philosophy of prioritizing infrastructure development before feature implementation, as Thawar states: “We’ve always built more infra. We will continue to always build more infra.”
-
The role of Shopify’s internal AI agent, River, in creating a unified “substrate of information” across the organization.
-
An insightful anecdote about Thawar’s OpenClaw agent autonomously identifying his travel schedule from his calendar, offering a glimpse into the future trajectory of AI agents.
Listen and subscribe to Beyond the Pilot on Spotify, Apple, or your preferred podcast platform.
Business Style Takeaway: Shopify’s proactive approach to AI model management, utilizing a proxy for seamless failover and employing model distillation for efficiency, demonstrates a critical strategy for mitigating risks and optimizing costs in the rapidly evolving AI landscape. This focus on flexible infrastructure and specialized models offers a blueprint for businesses aiming to maximize their AI investments while ensuring operational resilience.
Details can be found on the website : venturebeat.com
