PNY NVIDIA H100 Tensor Core GPU Accelerator 80GB HBM2e NVH100TCGPU-KIT
Discount available for eligible companies Click for details
Model #: NVH100TCGPU-KIT Item #: VGAPNYNH100R

PNY NVIDIA H100 Tensor Core GPU Accelerator 80GB HBM2e NVH100TCGPU-KIT

$26,999.00

World's Most Advanced Chip

Built with 80 billion transistors using a cutting edge TSMC 4N process custom tailored for NVIDIA's accelerated compute needs, H100 is the world's most advanced chip ever built. It features major advances to accelerate AI, HPC, memory bandwidth, interconnect and communication at data center scale.

Enhanced Asynchronous Execution Features

New Asynchronous Execution features include a new Tensor Memory Accelerator (TMA) unit that can transfer large blocks of data very efficiently between global memory and shared memory. TMA also supports asynchronous copies between Thread Blocks in a Cluster. There is also a new Asynchronous Transaction Barrier for doing atomic data movement and synchronization.

NVIDIA Hopper Architecture

The NVIDIA H100 Tensor Core GPU powered by the NVIDIA Hopper GPU architecture delivers the next massive leap in accelerated computing performance for NVIDIA's data center platforms. H100 securely accelerates diverse workloads from small enterprise workloads, to exascale HPC, to trillion parameter AI models. Implemented using TSMC's 4N process customized for NVIDIA with 80 billion transistors, and including numerous architectural advances, H100 is the world's most advanced chip ever built.

Second-Generation Multi-Instance GPU (MIG) Technology

With Multi-Instance GPU (MIG) previously introduced in Ampere, a GPU can be partitioned into several smaller, fully isolated instances with their own memory, cache, and compute cores. The Hopper architecture further enhances MIG by supporting multi-tenant, multi-user configurations in virtualized environments across up to seven secure GPU instances, securely isolating each instance with confidential computing at the hardware and hypervisor level. Dedicated video decoders for each MIG instance deliver secure, high-throughput intelligent video analytics (IVA) on shared infrastructure. With Hopper's concurrent MIG profiling administrators can monitor right-sized GPU acceleration and optimize resource allocation for users. For researchers with smaller workloads, rather than renting a full CSP instance, they can elect to use MIG to securely isolate a portion of a GPU while being assured that their data is secure at rest, in transit, and at compute.

Fourth-Generation Tensor Cores

New fourth-generation Tensor Cores are up to 6x faster chip-to-chip compared to A100, including per-SM speedup, additional SM count, and higher clocks of H100. On a per SM basis, the Tensor Cores deliver 2x the MMA (Matrix Multiply-Accumulate) computational rates of the A100 SM on equivalent data types, and 4x the rate of A100 using the new FP8 data type, compared to previous generation 16-bit floating point options. The Sparsity feature exploits fine-grained structured sparsity in deep learning networks, doubling the performance of standard Tensor Core operations.

New Confidential Computing Support

Today's confidential computing solutions are CPU-based, which is too limited for compute-intensive workloads like AI and HPC. NVIDIA Confidential Computing is a built-in security feature of the NVIDIA Hopper architecture that makes NVIDIA H100 the world's first accelerator with confidential computing capabilities. Users can protect the confidentiality and integrity of their data and applications in use while accessing the unsurpassed acceleration of H100 GPUs. It creates a hardware-based trusted execution environment (TEE) that secures and isolates the entire workload running on a single H100 GPU, multiple H100 GPUs within a node, or individual MIG instances. GPU-accelerated applications can run unchanged within the TEE and don't have to be partitioned. Users can combine the power of NVIDIA software for AI and HPC with the security of a hardware root of trust offered by NVIDIA Confidential Computing.

Structural Sparsity

AI networks are big, having millions to billions of parameters. Not all of these parameters are needed for accurate predictions, and some can be converted to zeros to make the models “sparse” without compromising accuracy. Tensor Cores in H100 can provide up to 2x higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also improve the performance of model training.

HBM2e Memory Subsystem

H100 is bringing massive amounts of compute to data centers. To fully utilize that compute performance, the NVIDIA H100 PCIe utilizes HBM2e memory with a class-leading 2 terabytes per second (TB/sec) of memory bandwidth, a 50 percent increase over the previous generation. In addition to 80 gigabytes (GB) of HBM2e memory, H100 includes 50 megabytes (MB) of L2 cache. The combination of this faster HBM memory and larger cache provides the capacity to accelerate the most computationally intensive AI models.

Transformer Engine Supercharges AI, Up to 30x Higher Performance

Transformer models are the backbone of language models used widely today from BERT to GPT-3. Initially developed for natural language processing (NLP) use cases, Transformer's versatility is increasingly applied to computer vision, drug discovery and more. Their size continues to increase exponentially, now reaching trillions of parameters and causing their training times to stretch into months due to large math bound computation, which is impractical for business needs. The Transformer Engine uses software and custom Hopper Tensor Core technology designed specifically to accelerate training for models built from the world's most important AI model building block, the Transformer. Hopper Tensor Cores have the capability to apply mixed 8-bit floating point (FP8) and FP16 precision formats to dramatically accelerate the AI calculations for transformers.

Fourth-Generation NVIDIA NVLink

Provides a 3x bandwidth increase on all-reduce operations and a 50% general bandwidth increase over the prior generation NVLink with 900 GB/sec total bandwidth for multi-GPU IO operating at nearly 5x the bandwidth of PCIe Gen 5.

New DPX Instructions

Dynamic programming is an algorithmic technique for solving a complex recursive problem by breaking it down into simpler subproblems. By storing the results of subproblems so that you don't have to recompute them later, it reduces the time and complexity of exponential problem solving. Dynamic programming is commonly used in a broad range of use cases. For example, Floyd-Warshall is a route optimization algorithm that can be used to map the shortest routes for shipping and delivery fleets. The Smith-Waterman algorithm is used for DNA sequence alignment and protein folding applications. Hopper introduces DPX instructions to accelerate dynamic programming algorithms by 40x (DPU instructions 40x vs CPU comparison) compared to CPUs and 7x compared to NVIDIA Ampere architecture GPUs. This leads to dramatically faster times in disease diagnosis, real-time routing optimizations, and even graph analytics.

PCIe Gen5 for State of the Art CPUs and DPUs

The H100 is NVIDIA's first GPU to support PCIe Gen5, providing the highest speeds possible at 128GB/s (bi-directional). This fast communication enables optimal connectivity with the highest performing CPUs, as well as with NVIDIA ConnectX-7 SmartNICs and BlueField-3 DPUs, which allow up to 400Gb/s Ethernet or NDR 400Gb/s InfiniBand networking acceleration for secure HPC and AI workloads.

New Thread Block Cluster Feature

Allows programmatic control of locality at a granularity larger than a single Thread Block on a single SM. This extends the CUDA programming model by adding another level to the programming hierarchy to now include Threads, Thread Blocks, Thread Block Clusters, and Grids. Clusters enable multiple Thread Blocks running concurrently across multiple SMs. to synchronize and collaboratively fetch and exchange data.
 

Enterprise Ready: AI Software Streamlines Development and Deployment

Enterprise adoption of AI is now mainstream and organizations require end-to-end, AI ready infrastructure that will future proof them for this new era. NVIDIA H100 Tensor Core GPUs for mainstream servers (PCIe) come with NVIDIA AI Enterprise software, making AI accessible to nearly every organization with the highest performance in training, inference, and data-science. NVIDIA AI Enterprise together with NVIDIA H100 simplifies the building of an AI-ready platform, accelerates AI development and deployment with enterprise-grade support, and delivers the performance, security, and scalability to gather insights faster and achieve business value sooner.
 

General Information
Product TypeGraphic Card
Brand NamePNY
ManufacturerPNY Technologies
Product NameNVIDIA H100 Graphic Card
Manufacturer Part NumberNVH100TCGPU-KIT
Manufacturer Website Addresshttp://www.pny.com
Technical Information
Multi-GPU TechnologyNVLink
Processor & Chipset
Chipset ManufacturerNVIDIA
Chipset ModelH100
Memory
Memory TechnologyHBM3
Standard Memory80 GB
Power Description
Power Supply Wattage350 W
Physical Characteristics
Slot Space RequiredDual
Cooler TypePassive Cooler
Miscellaneous
Environmentally FriendlyYes
Environmental CertificationRoHS
Platform SupportedPC
Platform SupportedLinux
Warranty
Limited Warranty3 Year
Search engine powered by ElasticSuite
Copyright © 2022 Central Computers Inc. All rights reserved.
PNY NVIDIA H100 Tensor Core GPU Accelerator 80GB HBM2e NVH100TCGPU-KIT is available to buy in increments of 1