Thinking Machines Unveils Near-Realtime AI Voice & Video Conversation with New Interaction Models

The prevailing mode of interaction with artificial intelligence, characterized by a user submitting input and then waiting for a response, may soon become a relic of the past. This “turn-based” approach, common across text, image, audio, and video modalities, faces a significant challenge as AI aims to facilitate more natural and fluid human interaction.

The limitations of this sequential processing become apparent when considering AI’s potential to shoulder tasks requiring seamless conversational ability. True natural interaction necessitates an AI that can process incoming information while simultaneously preparing its next response, mirroring human cognitive flexibility.

This vision is central to the mission of Thinking Machines, a well-funded startup founded by former OpenAI executives including Chief Technology Officer Mira Murati and co-founder John Schulman. The company has introduced “interaction models,” a novel category of natively multimodal systems designed to prioritize interactivity within their core architecture, rather than treating it as an add-on software layer.

The firm’s latest research preview showcases these models, which have demonstrated considerable gains on third-party benchmarks and achieved reduced latency, hinting at a future where AI collaboration is significantly more dynamic.

While not yet available to the public or enterprises, Thinking Machines plans a limited research preview in the coming months to gather feedback, with broader access anticipated later this year.

‘Full Duplex’ Processing: Real-Time Simultaneous Input and Output

A fundamental shift underpinning this advancement is the AI’s perception of time and real-time engagement. Current leading AI models process information sequentially: they await complete user input before initiating processing and remain in a static state while generating a response. This effectively pauses their perception of ongoing events.

Researchers at Thinking Machines highlight this as a constraint forcing users to adapt their communication style to AI’s limitations, often leading to the practice of composing queries like emails or batching thoughts. This bottleneck hinders genuine collaboration.

To overcome this, Thinking Machines has moved beyond the conventional alternating token sequence. Their approach employs a multi-stream, micro-turn design that processes input and output in 200-millisecond intervals concurrently.

This “full-duplex” architecture empowers the model to perceive, communicate, and visualize in real time. It can offer conversational cues while a user is speaking or proactively interject based on visual cues—such as identifying a coding error or a new participant entering a video call.

Technically, the model leverages encoder-free early fusion. Instead of relying on large, independent encoders like Whisper for audio processing, the system integrates raw audio signals (via dMel representation) and image patches (40×40 pixels) through a lightweight embedding layer. All components are co-trained from scratch within a transformer framework.

A Dual-Model Architecture for Enhanced Performance

The research preview features TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters. Recognizing that real-time interaction demands near-instantaneous responses that can sometimes conflict with deep analytical processing, the company has implemented a two-tiered system:

The Interaction Model: This component maintains continuous dialogue with the user, managing conversation flow, real-time presence detection, and immediate follow-up actions.
The Background Model: Operating asynchronously, this agent handles more intensive tasks such as sustained reasoning, web searches, or complex tool integrations. It streams results back to the interaction model to be seamlessly incorporated into the ongoing exchange.

This architecture enables the AI to perform tasks like live translation or generate visualizations (e.g., UI charts) while remaining attentive to user input. A demonstration video showcased the model providing human-like reaction times to various cues while concurrently producing a bar chart visualization.

Leading Benchmarks Highlight Superior Real-Time Performance

To validate their approach, Thinking Machines utilized FD-bench, a benchmark specifically designed to evaluate interaction quality rather than just raw computational power. The results indicate that TML-Interaction-Small significantly surpasses current real-time AI systems:

Responsiveness: Achieved a turn-taking latency of just 0.40 seconds, outperforming Gemini-3.1-flash-live (0.57s) and GPT-realtime-2.0 (1.18s).
Interaction Quality: On FD-bench V1.5, the model scored an impressive 77.8, nearly doubling the scores of its closest competitors (GPT-realtime-2.0 minimal scored 46.8).
Visual Proactivity: In specialized tests like RepCount-A (measuring physical repetitions in video) and ProactiveVideoQA, the Thinking Machines model actively engaged with visual content, while other leading models either remained unresponsive or provided inaccurate answers.

Metric	TML-Interaction-Small	GPT-realtime-2.0 (min)	Gemini-3.1-flash-live (min)
Turn-taking latency (s)	0.40	1.18	0.57
Interaction Quality (Avg)	77.8	46.8	54.3
IFEval (VoiceBench)	82.1	81.7	67.6
Harmbench (Refusal %)	99.0	99.5	99.0

Transformative Potential for Enterprises, Pending Wider Availability

Upon its release to the enterprise sector, Thinking Machines’ interaction models are poised to fundamentally alter how businesses integrate AI into their operational frameworks. The native interactivity of TML-Interaction-Small unlocks enterprise capabilities that are currently difficult or impossible to achieve with conventional multimodal models:

Current enterprise AI solutions require the completion of a distinct turn before data analysis can commence. In settings like manufacturing or laboratory environments, a native interaction model could continuously monitor video feeds and proactively flag safety violations or protocol deviations the moment they occur, eliminating the need for human requests for feedback.

The model’s demonstrated success in visual benchmarks, such as RepCount-A and ProactiveVideoQA, suggests its potential utility as a real-time auditor for critical physical operations.

A significant friction point in contemporary voice-based customer service is the typical 1-2 second processing delay inherent in standard APIs. Thinking Machines’ model, with its 0.40-second latency, approaches the speed of natural human conversation. This capability allows for advanced features such as “backchannel” communication (e.g., “I see,” “mm-hmm”) without interrupting the user, and real-time translation that feels integrated rather than disjointed.

Traditional large language models (LLMs) lack an intrinsic sense of time, relying on explicit textual prompts for temporal information. Interaction models, however, are natively time-aware, enabling them to manage time-sensitive processes effectively. Applications such as setting interval-based reminders (“Remind me to check the temperature every 4 minutes”) or establishing condition-based alerts (“Alert me if this process exceeds the duration of the previous one”) become feasible, crucial for sectors like industrial maintenance and pharmaceutical research where precise timing is paramount.

Background on Thinking Machines

This release marks the second significant development from Thinking Machines, following the October 2025 launch of Tinker. Tinker is a managed API service designed for fine-tuning language models, offering researchers and developers granular control over their data and training methodologies while outsourcing the infrastructure complexities of distributed training to Thinking Machines. The platform supports a wide range of models, including large open-weight models and mixture-of-experts architectures, and has been adopted by research groups at institutions including Princeton, Stanford, and Berkeley.

When it launched in early 2025, Thinking Machines positioned itself as an AI research and product company focused on making advanced AI systems “more widely understood, customizable, and generally capable.”

In July 2025, the company announced a substantial funding round, reportedly raising approximately $2 billion at a $12 billion valuation, led by Andreessen Horowitz with significant participation from industry leaders such as Nvidia, Accel, ServiceNow, Cisco, AMD, and Jane Street. This was notably described by WIRED as the largest seed funding round in history.

The Wall Street Journal reported in August 2025 that Meta CEO Mark Zuckerberg had approached Mira Murati regarding an acquisition of Thinking Machines Lab. Following her refusal, Meta reportedly pursued over a dozen of the startup’s approximately 50 employees.

In March and April 2026, Thinking Machines also garnered attention for its ambitious compute infrastructure plans. The company announced a partnership with Nvidia to deploy at least one gigawatt of next-generation Vera Rubin systems. Concurrently, it expanded its collaboration with Google Cloud to leverage Google’s AI Hypercomputer infrastructure, utilizing Nvidia GB300 systems for model research, reinforcement learning workloads, frontier model training, and the Tinker platform.

By April 2026, Business Insider reported that Meta had hired seven founding members from Thinking Machines, including Mark Jen and Yinghai Lu, along with researcher Tianyi Zhang. Joshua Gross, who was instrumental in developing Thinking Machines’ flagship fine-tuning product Tinker, also joined Meta’s Superintelligence Labs. Despite these departures, the company was reported to have grown to approximately 130 employees.

Conversely, Thinking Machines has also attracted significant talent, including Meta veteran Soumith Chintala, the creator of PyTorch, who joined as CTO. Other high-profile technical hires include Neal Wu. TechCrunch separately reported in April 2026 that Weiyao Wang, an eight-year Meta veteran specializing in multimodal perception systems, had joined Thinking Machines, indicating a notable inflow of talent to the startup.

Thinking Machines has previously committed to incorporating “significant open-source components” in its releases to foster the research community. It remains to be seen whether these new interaction models will follow the same open-source ethos and release terms.

However, the company’s fundamental belief is clear: by making interactivity a native aspect of the model architecture, Thinking Machines asserts that scaling these models will simultaneously enhance their intelligence and their effectiveness as collaborators.

Business Style Takeaway: The development of “interaction models” by Thinking Machines signifies a critical shift from static, turn-based AI interactions to dynamic, real-time collaboration, promising to unlock unprecedented levels of automation and responsiveness in enterprise applications. Businesses that embrace this evolution in AI interaction stand to gain a significant competitive advantage through enhanced operational efficiency, proactive problem-solving, and more natural human-AI interfaces.

Learn more at : venturebeat.com

No votes yet.

Please wait...

‘Full Duplex’ Processing: Real-Time Simultaneous Input and Output

A Dual-Model Architecture for Enhanced Performance

Leading Benchmarks Highlight Superior Real-Time Performance

Transformative Potential for Enterprises, Pending Wider Availability

Background on Thinking Machines

Leave a ReplyCancel Reply