Microsoft has announced the general availability of Azure confidential virtual machines (VMs—NCC H100 v5 SKU) featuring NVIDIA Tensor Core GPUs. These VMs combine hardware-based data protection from 4th-generation AMD EPYC processors with high performance.
The GA release follows the preview of the VMs last year. By enabling confidential computing on GPUs, Azure offers customers increased options and flexibility to run their workloads securely and efficiently in the cloud. These virtual machines are ideally suited for tasks such as inferencing, fine-tuning, and training small to medium-sized models. This includes models like Whisper, Stable Diffusion, its variants (SDXL, SSD), and language models such as Zephyr, Falcon, GPT-2, MPT, Llama2, Wizard, and Xwin.
The NCC H100 v5 VM SKUs offer a hardware-based Trusted Execution Environment (TEE) that improves the security of guest virtual machines (VMs). This environment protects against potential access to VM memory and state by the hypervisor and other host management code, thereby safeguarding against unauthorized operator access. Customers can initiate attestation requests within these VMs to verify that they are running on a properly configured TEE. This verification is essential before releasing keys and launching sensitive applications.
(Source: Tech Community Blog Post)
In a LinkedIn post by Vikas Bhatia, head of product, Azure confidential computing, and Drasko Draskovic, founder & CEO of Abstract Machines commented:
Congrats for this, but attestation is still the weakest point of TEEs in CSP VMs. Current attestation mechanisms from Azure and GCP – if I am not mistaken – demand trust with the cloud provider, which in many ways beats the purpose of Confidential Computing. Currently – looks that baremetal approach is the only viable option, but this again in many ways removes the need for TEEs (except for providing the service of multi-party computation).
Several companies have leveraged the Azure NCC H100 v5 GPU virtual machine for workloads like confidential audio-to-text inference using Whisper models, video analysis for incident prevention, data privacy with confidential computing, and stable diffusion projects with sensitive design data in the automotive sector.
Besides Microsoft, the two other big hyperscalers, AWS and Google, also offer NVIDIA H100 Tensor Core GPUs. For instance, AWS offers H100 GPUs through its EC2 P5 instances, which are optimized for high-performance computing and AI applications.
In a recent whitepaper about the architecture behind NVIDIA’s H100 Tensor Core GPU (based on Hopper architecture), the NVIDIA company authors write:
H100 is NVIDIA’s 9th-generation data center GPU designed to deliver an order-of-magnitude performance leap for large-scale AI and HPC over our prior-generation NVIDIA A100 Tensor Core GPU. H100 carries over the major design focus of A100 to improve strong scaling for AI and HPC workloads, with substantial improvements in architectural efficiency.
Lastly, Azure NCC H100 v5 virtual machines are currently only available in East US2 and West Europe regions.