https://www.together.ai/blog
Together AI has achieved ISO 27001:2022 certification, validating its Information Security Management System for secure AI workloads. This certification enhances governance around sensitive data, access control, and incident response, ensuring robust protection for customers using its global platform.
Together AI's achievement of ISO 27001:2022 certification signifies a commitment to robust information security for AI workloads, which is crucial for builders and PMs focusing on enterprise solutions. For investors, this certification enhances the company's credibility and marketability, indicating a lower risk profile in handling sensitive data.
MiniMax's M3 model introduces a 1M-token context and multimodal capabilities, optimized for efficient inference with a 9x speedup in prefill and 15x in decoding, supported by Together AI's cloud infrastructure.
The introduction of MiniMax's M3 model with 1M-token context and multimodal capabilities allows builders and PMs to create more complex and contextually aware applications, significantly enhancing user experience. For investors, the 9x speedup in prefill and 15x in decoding represents a critical advancement in AI efficiency, indicating potential for higher returns in scalable AI solutions.
Together AI developed the fastest speech-to-text stack, achieving 20 hours of transcription in under 10 seconds using NVIDIA's Parakeet-TDT 0.6B v3 and OpenAI's Whisper Large v3. Key optimizations included TensorRT for encoder efficiency and GPU-based decoding, resulting in a 2-3x faster performance.
Together AI's development of the world's fastest speech-to-text stack, capable of transcribing 20 hours of audio in under 10 seconds, signals a significant leap in real-time transcription capabilities. This advancement can enhance applications in accessibility, customer service, and content creation, making it a critical consideration for builders and PMs focused on integrating efficient speech recognition into their products.
Together AI's new benchmark for coding agents reveals that its Together Inference Engine achieves 31% higher TPS than TensorRT-LLM, maintaining under 1s TTFT at 625 TPM per GPU. This performance is crucial for handling high concurrency and long context requests in production environments.
Together AI's new benchmark for coding agents, demonstrating a 31% higher transactions per second (TPS) than TensorRT-LLM, is significant for builders and PMs as it indicates improved performance for high-concurrency applications. For investors, this development suggests a competitive edge in the market, potentially leading to increased adoption and revenue opportunities in AI-driven solutions.
Violin is an open-source video translation tool by Together AI, utilizing Whisper V3 for ASR, Deepseek V4 Pro for translation, and Cartesia’s Sonic 3 for TTS, enabling high-quality multilingual video accessibility. It features an interactive chat assistant for user queries and is designed for content creators and developers alike.
Violin's open-source video translation tool leverages advanced AI technologies to enhance multilingual accessibility for video content. This development signals a growing demand for tools that enable creators to reach broader audiences, presenting opportunities for builders and PMs to integrate similar capabilities into their products, while investors can identify potential market growth in the video translation sector.
Together AI's Voice Finder tool allows developers to quickly search over 600 voices across multiple TTS models, including MiniMax and Deepgram. By using prompts or audio samples, users can find suitable voices based on 15+ metadata attributes, streamlining the process of selecting the right voice for applications like fintech support or meditation guides.
Together AI's Voice Finder tool enables developers to efficiently select from over 600 TTS voices using metadata attributes, significantly reducing the time spent on voice selection for applications like fintech and meditation. This development is crucial for builders and PMs looking to enhance user experience through personalized voice interactions, while investors can see potential for increased adoption of voice technology in various sectors.
DeepSeek-V4 transforms million-token context into a serving-systems challenge, as explored by Together AI on NVIDIA HGX B200. Key innovations include compressed KV layouts, prefix caching, and optimized kernel maturity for efficient long-context inference workloads.
The development of DeepSeek-V4, which addresses million-token context as a serving-systems challenge, is significant for builders and PMs as it highlights the need for optimized infrastructure in AI applications. Investors should note that innovations like compressed KV layouts and prefix caching can lead to more efficient long-context inference, potentially enhancing the performance and scalability of AI systems.
Deploy any Hugging Face model effortlessly with Goose and Together's Dedicated Container Inference. This solution allows users to run models in a production-grade GPU environment with just one prompt, eliminating setup complexities and enabling immediate deployment on release day.
The launch of Goose and Together's Dedicated Container Inference allows builders and PMs to deploy Hugging Face models with minimal setup, streamlining the path from development to production. This efficiency can significantly reduce time-to-market for AI applications, making it a crucial development for investors looking for scalable solutions in the AI space.
As AI transitions from research to production, the focus for AI-native teams is shifting towards efficient and reliable model deployment at scale. This involves overcoming challenges related to resource management and performance optimization to ensure models operate effectively in real-world applications.
The shift towards efficient inference at scale is critical for builders and PMs as it directly impacts the feasibility of deploying AI models in real-world applications, requiring advancements in resource management and performance optimization. For investors, this development signals potential growth opportunities in companies that can successfully navigate these challenges and deliver scalable AI solutions.
Together AI partners with Adaption to integrate Together Fine-Tuning into Adaptive Data, enhancing model training efficiency. This collaboration enables users to optimize datasets and achieve an average 82% increase in data quality, facilitating the deployment of fine-tuned models on Together AI's infrastructure.
The partnership between Together AI and Adaption to integrate Together Fine-Tuning into Adaptive Data significantly enhances model training efficiency, achieving an average 82% increase in data quality. This development is crucial for builders and PMs as it streamlines the deployment of fine-tuned models, ultimately reducing time-to-market and improving product performance.
Together AI swiftly mitigated the Copy Fail vulnerability (CVE-2026-31431) by disabling the algif_aead interface across its infrastructure, preventing potential privilege escalation and cross-tenant risks in AI workloads. This proactive measure ensured minimal operational impact while maintaining security in shared kernel environments.
The swift mitigation of the Copy Fail vulnerability (CVE-2026-31431) by Together AI highlights the importance of proactive security measures in AI infrastructure. Builders and PMs should prioritize security in shared environments to prevent privilege escalation risks, while investors should recognize that robust security practices can enhance the overall reliability and trustworthiness of AI systems.
DeepSeek-V4 Pro is now available on Together AI, offering 1.6T-parameter MoE with 512K context for serverless inference. It supports three reasoning modes and reduces costs for repeated long-context queries, with input pricing at $2.10 per million tokens.
The release of DeepSeek-V4 Pro on Together AI, featuring 1.6T parameters and 512K context for serverless inference, significantly lowers costs for applications requiring long-context queries. This development allows builders and PMs to create more efficient AI solutions while investors can recognize potential cost savings and scalability in AI deployment.
Together AI has launched the NVIDIA Nemotron 3 Nano Omni, a model that combines video, audio, images, and language reasoning. This model, utilizing a hybrid Mamba-Transformer architecture, allows developers to build agentic applications with high efficiency and low latency, streamlining deployment from prototype to production without infrastructure management.
The launch of the NVIDIA Nemotron 3 Nano Omni by Together AI enables developers to create multimodal AI applications with improved efficiency and reduced latency. This development allows builders and PMs to streamline their deployment processes, while investors can recognize the potential for scalable solutions in the rapidly evolving AI landscape.
Together AI's Distribution-aware Speculative Decoding (DAS) framework accelerates RL rollouts by over 50% without altering model outputs, addressing the rollout bottleneck in reinforcement learning. This improvement is crucial for large models like DeepSeek-R1, which experience significant delays during the rollout phase, consuming 70% of total training time.
Together AI's Distribution-aware Speculative Decoding (DAS) framework significantly speeds up reinforcement learning rollouts by over 50%, directly addressing a major bottleneck in training efficiency. This development is crucial for builders and PMs working on large models, as it reduces training time and costs, while investors should note its potential to enhance product competitiveness and accelerate deployment timelines.
Multi-tenant GPU clusters enable AI-native teams to share resources efficiently while maintaining isolation, preventing idle capacity and ensuring predictable access. This architecture supports pooled economics without chaos, allowing teams to operate as if they have dedicated clusters.
The development of multi-tenant GPU clusters allows AI-native teams to optimize resource usage while ensuring isolation and predictable access. This architecture can significantly reduce operational costs and improve efficiency, making it a crucial consideration for builders, PMs, and investors focused on scalable AI solutions.
Parcae, a new stable looped architecture by Together AI, achieves up to 6.3% lower validation perplexity than previous models while using only 770M parameters, matching the performance of a 1.3B parameter transformer. This innovation allows for scaling model quality without increasing memory footprint, addressing the challenges of training looped models effectively.
The development of Parcae's stable looped architecture by Together AI is significant as it achieves competitive performance with fewer parameters, allowing builders and PMs to create more efficient models that require less computational resources. For investors, this innovation signals a potential reduction in operational costs and increased scalability in AI applications, enhancing the attractiveness of AI investments.
EinsteinArena enables AI agents to collaboratively tackle complex mathematical problems, achieving a new lower bound of 604 for the kissing number in 11 dimensions, surpassing the previous record of 593 set by AlphaEvolve. This platform fosters real-time collaboration and optimization among agents, demonstrating the power of collective intelligence in scientific discovery.
The development of EinsteinArena, which achieved a new lower bound for the kissing number in 11 dimensions, showcases the potential of collaborative AI agents in solving complex problems. For builders and PMs, this indicates a shift towards leveraging collective intelligence in product development, while investors should note the emerging opportunities in AI-driven scientific research and optimization platforms.
AI Native Clouds are purpose-built for AI-native companies, enabling rapid iteration and scaling of models like Cursor and Decagon. They integrate cutting-edge research continuously, ensuring high performance and low latency, essential for maintaining competitive advantage in a fast-evolving landscape.
The emergence of AI Native Clouds, designed specifically for AI-native companies, signifies a shift towards infrastructure that supports rapid model iteration and scaling. For builders and PMs, this means access to optimized resources that enhance performance and reduce latency, while investors should recognize the potential for competitive advantage in a market increasingly reliant on advanced AI capabilities.
Recent research from Together AI demonstrates that LLMs can significantly optimize database query execution, achieving up to 4.78x speedup and reducing resource usage dramatically. By implementing DBPlanBench, the Apache DataFusion engine can leverage semantic reasoning to improve join ordering, resulting in faster queries and lower memory consumption.
The development of DBPlanBench by Together AI, which uses LLMs to optimize database query execution, can significantly enhance the performance of data-driven applications. For builders and PMs, this means faster query responses and reduced infrastructure costs, while investors should note the potential for increased efficiency in data processing as a competitive advantage in the market.
Together AI has launched the Wan 2.7 video model suite, featuring four models for text-to-video, image-to-video, reference-to-video, and video editing. Starting at $0.10 per second, it offers enhanced creative control with audio inputs and frame-level conditioning, streamlining workflows for developers.
The launch of the Wan 2.7 video model suite by Together AI introduces advanced capabilities for text-to-video and video editing at a competitive price, which can significantly enhance content creation workflows for developers and product managers. This development signals a growing trend in AI-driven media production, presenting investment opportunities in tools that facilitate creative processes.
Deepgram's speech-to-text (STT) and text-to-speech (TTS) models are now natively available on Together AI, enhancing real-time voice agents with improved turn detection and transcription accuracy. Models like Flux and Nova-3 ensure responsiveness and clarity in challenging environments such as contact centers and healthcare, while Aura-2 maintains consistency in enterprise applications.
Deepgram's speech-to-text and text-to-speech models are now integrated into Together AI, which enhances the functionality of voice agents in critical sectors like healthcare and contact centers. This development signals a shift towards more accurate and responsive AI communication tools, making them more viable for enterprise applications and improving user experience.
The Together AI kernels team, led by Dan Fu and Tri Dao, achieved 2-3x speedups in GPU performance with FlashAttention, revolutionizing AI-native cloud infrastructure. Their ThunderKittens library enabled rapid adaptation to NVIDIA's Blackwell GPUs, producing FP4 and FP8 GEMM kernels with up to 2x speed improvements over cuBLAS.
The development of the ThunderKittens library by Together AI, which allows for 2-3x speedups in GPU performance using FlashAttention and new FP4/FP8 GEMM kernels, is significant for builders and PMs as it enhances the efficiency of AI workloads, potentially reducing costs and improving performance. For investors, this indicates a competitive edge in AI infrastructure that could drive higher returns.
Together AI's Aurora is an open-source RL-based framework that enhances speculative decoding by learning from live inference, achieving a 1.25x speedup over static models like Qwen3 and Llama3, while reducing infrastructure costs and adapting to user demands.
Together AI's Aurora framework represents a significant advancement in reinforcement learning for natural language processing, offering a 1.25x speedup over traditional models while lowering infrastructure costs. This allows builders and PMs to create more efficient applications that can adapt dynamically to user needs, making it an attractive investment opportunity for those looking to capitalize on AI's evolving capabilities.
The research from Together AI reveals that smaller models using a 'Divide & Conquer' framework can outperform GPT-4o in long context tasks, demonstrating significant performance gains while being cheaper and faster. This approach addresses model confusion and aggregation noise, making it effective for tasks like QA and summarization, though it has limitations for high-synergy tasks.
The research from Together AI highlights that smaller models using a 'Divide & Conquer' framework can outperform larger models like GPT-4o in long-context tasks, offering a cost-effective and efficient solution for builders and PMs. This signals a shift towards optimizing resource allocation in AI development, making it crucial for investors to consider smaller, specialized models for various applications.