German IT firm TNG Technology Consulting has released a new open-source AI model that is reportedly twice as fast as the DeepSeek R1-0528 variant from May it is based on. Released this week on the Hugging Face platform, DeepSeek-TNG R1T2 Chimera achieves its remarkable efficiency through a novel ‘Assembly-of-Experts’ technique.
This method merges components from three different parent models, including the original DeepSeek R1 and V3 models. The result is a model that retains high-level reasoning capabilities while generating answers with 60% fewer tokens, drastically cutting inference costs and response times for developers.
The AI developer community has responded with enthusiasm. On X, Hugging Face senior leader Vaibhav Srivastav wrote, “DAMN! DeepSeek R1T2 – 200% faster than R1-0528 & 20% faster than R1,” highlighting its performance gains. The model is available under a permissive MIT License, allowing for broad commercial use and modification.
Assembly-of-Experts: A Novel Approach to Model Creation
TNG’s “Assembly-of-Experts” (AoE) method represents a significant departure from conventional model creation. Instead of fine-tuning or retraining, AoE builds a new model by selectively merging the weight tensors from multiple pre-trained parents, a process detailed in a recent research paper from June.
The implementation focuses on merging the routed expert tensors—the parts of a model most responsible for specialized knowledge—while retaining the more efficient shared layers from faster parents. This “Tri-Mind” Chimera combines the reasoning of R1-0528, structured thought of R1, and conciseness of V3-0324.
This approach is distinct from the Mixture-of-Experts (MoE) architecture used in its parent models. While MoE is a runtime architecture that activates a fraction of a model’s “experts” for any given task, AoE is a construction technique that bakes the combined expertise into a single, more efficient final model.
Benchmarks: Balancing Raw Intelligence with Extreme Efficiency
The practical benefit of this technique is a powerful balance of intelligence and speed. According to benchmarks published by TNG, R1T2 Chimera achieves between 90% and 92% of the reasoning performance of its most powerful parent, R1-0528, on demanding tests like AIME and GPQA.
These benchmarks are designed to test sophisticated, multi-step reasoning that goes far beyond simple knowledge recall. However, the model’s key advantage is conciseness. It generates correct answers using approximately 40% of the tokens required by R1-0528, a 60% reduction in output length.
This directly translates to faster response times and lower compute costs, making it over twice as fast in practical terms. This efficiency was a hallmark of its V3 parent. After its March release, developer Awni Hannun said of the improved March 2025 variant of V3, “it’s the most powerful model I’ve ever run on my laptop,” after running it on his laptop. R1T2 Chimera successfully grafts this efficiency onto a stronger reasoning core.
An Innovation Amid Geopolitical and Corporate Headwinds
The release of this highly efficient model comes at a turbulent time for its original creator, DeepSeek AI. The Chinese firm’s momentum has stalled, with its anticipated R2 model now indefinitely delayed. This is due to both internal performance dissatisfaction and the impact of US export controls on vital AI chips.
Simultaneously, DeepSeek faces mounting regulatory pressure in the West. In Germany, Berlin’s data protection authority has requested Apple and Google remove the DeepSeek app from stores, labeling it “unlawful content” due to illegal data transfer risks to China.
This follows a damning April report from the US House Select Committee on the CCP. Committee Chairman John Moolenaar stated, “this report makes it clear: DeepSeek isn’t just another AI app — it’s a weapon in the Chinese Communist Party’s arsenal…,” alleging the app is a tool for espionage and data harvesting. These external pressures create a complex backdrop for any technology derived from DeepSeek’s work.
Enterprise Deployment: Availability, Licensing, and Limitations
For enterprise technical leaders, R1T2 Chimera presents a compelling option. Its MIT license offers maximum flexibility for private hosting, customization, and deployment in commercial applications without licensing fees. The significant reduction in inference cost makes it ideal for high-throughput or real-time environments.
The cost savings are particularly relevant for applications like real-time customer support chatbots, large-scale document summarization, or internal knowledge base queries, where both speed and budget are critical. It places the model in a desirable quadrant on the performance-versus-cost curve.
However, TNG notes some current limitations. The model is not yet recommended for use cases requiring function calling or tool use, meaning it cannot reliably interact with external APIs. This limits its use in complex, automated workflows, though future updates may address this gap.
Furthermore, the company advises European users to assess their compliance with the EU AI Act, which has extraterritorial reach. Despite these caveats, the release of R1T2 Chimera by TNG marks a notable step in modular AI development, offering a glimpse into a future where models are assembled, not just trained.