Pinterest Slashes AI Costs 90% by Removing Frontier Model Vision Layer

Pinterest has achieved a remarkable 90% cost reduction and a 30% accuracy improvement in its visual recommendation engine by strategically customizing open-source large language models (LLMs). The platform, boasting 620 million monthly users, faced significant cost challenges with off-the-shelf frontier models for its image recommendations. To overcome this, Pinterest’s CTO, Matt Madrigal, spearheaded an initiative to modify the vision layer of the Qwen3-VL model, replacing it with proprietary embeddings and re-engineering its functionality.

Madrigal emphasized Pinterest’s commitment to developing AI capabilities in-house, particularly by fine-tuning open-source models. He articulated that for unique datasets, the quality of that data, when used for fine-tuning, can often surpass the benefits derived from sheer model size.

Pinterest’s Customization of Qwen for Enhanced Visual Discovery

Pinterest has a history of leveraging open-source AI, having previously adapted models like Google’s BERT and OpenAI’s CLIP for its visual search and discovery features. Their adaptation of CLIP, named Pin CLIP, integrated proprietary visual embeddings and image metadata to further refine its performance.

The platform’s conversational shopping assistant, Navigator 1, was developed using Qwen3-VL. Madrigal’s team undertook substantial modifications, effectively removing the original vision encoder and replacing it with custom, proprietary multimodal embeddings. This approach allows Pinterest to better capture and process metadata associated with pins and images. The precomputation of this metadata, combined with regular retraining on new data, enables highly personalized user experiences.

“Open-source models, especially with open Apache licenses where you can truly tweak a lot of open weights and customize for unique use cases — that’s where we’ve found open source to be so powerful for us,” Madrigal stated.

The integration of their own embeddings provides deeper contextual understanding of metadata, pins, and images, leading to significant performance gains during runtime and inference. Without these custom embeddings, each image would require individual encoding at runtime, resulting in a reported 20-fold increase in latency. Madrigal highlighted that for features critical to user engagement and scaling to over 600 million users, the company’s strategy is either in-house development or intensive customization of open-source solutions.

VB Transform · July 14–15 · Menlo Park · Agentic orchestration

Intuit rebuilt its multi-agent system in 60 days. What did they change — and why?

At Transform, engineering leaders from Intuit, Target, and Instacart break down how they redesigned their orchestration architectures for reliability, scale, and real customers.

See the full agenda →

Developing a ‘Taste Graph’ for Dynamic User Preferences

To effectively guide users from initial inspiration to a purchase decision, Pinterest has developed what Madrigal describes as a “taste graph.” This sophisticated system dynamically models individual user preferences, moving beyond simple click data to represent genuine user affinities. It functions as a constantly evolving representation of billions of users’ tastes.

Madrigal contrasted Pinterest’s discovery-focused approach with search engines like Google, where users typically have a defined intent. Pinterest aims to foster “lateral exploration,” transforming passive discovery into active intent, which can lead to ad engagement or purchases. The underlying architecture merges graph structures with advanced representational learning techniques. User embeddings, which capture shifting tastes, are continuously updated based on user activity, new content, and other signals.

“It’s not a social graph,” Madrigal clarified. “It’s much more of a preference graph: What’s going to inspire you? What are you trying to do next?”

For example, the taste graph can differentiate between users interested in mid-century modern design versus those who prefer a Nantucket aesthetic, delivering tailored product recommendations. This system effectively bridges the gap between upper-funnel inspiration and lower-funnel purchasing intent.

Key insights from the podcast include:

Pinterest utilizes sandboxes to foster secure and contained creative exploration.
A continuous feedback loop is crucial for preventing inaccuracies in visual AI systems.
Consistent benchmarking is essential for monitoring user engagement, performance, latency, and other critical metrics.

The full podcast episode is available on Spotify, Apple Podcasts, or wherever you listen to your favorite shows.

Business Style Takeaway: Tailoring open-source AI models with proprietary data and custom embeddings can unlock significant cost efficiencies and performance gains, offering a strategic advantage over relying solely on large, general-purpose models. This approach is particularly effective for platforms with unique data sets and specific user engagement goals, enabling businesses to build more specialized and performant AI solutions.

Based on materials from : venturebeat.com

No votes yet.

Please wait...

Pinterest’s Customization of Qwen for Enhanced Visual Discovery

Developing a ‘Taste Graph’ for Dynamic User Preferences

Leave a ReplyCancel Reply