
Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication
Quick Take
UC Berkeley's UCCL team has launched mKernel, a unified CUDA kernel that integrates intra-node NVLink, inter-node RDMA, and dense computing. This library aims to enhance GPU-driven communication across multiple GPUs and nodes, streamlining performance and efficiency for complex computational tasks.
Key Points
- mKernel fuses NVLink and RDMA for improved GPU communication.
- The library supports multi-GPU and multi-node configurations.
- Designed to enhance performance in dense computational tasks.
- Developed by UC Berkeley's UCCL team.
Article Excerpt
From source RSS / original summaryUC Berkeley's UCCL team releases mKernel, fusing intra-node NVLink, inter-node RDMA, and dense compute into a single persistent CUDA kernel. The post Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication appeared first on MarkTechPost.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from MarkTechPost
See more →
Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate
Perplexity AI has released a rewritten Unigram tokenizer that significantly reduces reranker latency by achieving 5-6x lower p50 latency compared to Hugging Face's tokenizers. This advancement also leads to a substantial decrease in production CPU utilization, benefiting developers and companies relying on efficient tokenization in their AI applications.



