这是用户在 2024-11-27 16:28 为 https://www.nvidia.com/en-us/data-center/h100/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
NVIDIA H100 Tensor Core GPU

NVIDIA H100 Tensor Core GPU
英伟达™(NVIDIA®)H100 张量核图形处理器

Extraordinary performance, scalability, and security for every data center.
为每个数据中心提供卓越的性能、可扩展性和安全性。

An Order-of-Magnitude Leap for Accelerated Computing
加速计算的数量级飞跃

The NVIDIA H100 Tensor Core GPU delivers exceptional performance, scalability, and security for every workload. H100 uses breakthrough innovations based on the NVIDIA Hopper™ architecture to deliver industry-leading conversational AI, speeding up large language models (LLMs) by 30X. H100 also includes a dedicated Transformer Engine to solve trillion-parameter language models.
英伟达™(NVIDIA®)H100 Tensor Core GPU 可为各种工作负载提供卓越的性能、可扩展性和安全性。H100 采用基于 NVIDIA Hopper™ 架构的突破性创新技术,提供业界领先的对话式人工智能,将大型语言模型(LLMs)的速度提高了 30 倍。H100 还包括一个专用的 Transformer Engine,用于解决万亿参数语言模型。

Securely Accelerate Workloads From Enterprise to Exascale
安全加速从企业级到超大规模的工作负载

Up to 4X Higher AI Training on GPT-3
GPT-3 上的人工智能训练最多可提高 4 倍

Up to 4X Higher AI Training on GPT-3

Projected performance subject to change.
预计业绩可能会有变动。

  GPT-3 175B training A100 cluster: HDR IB network, H100 cluster: NDR IB network | Mixture of Experts (MoE) Training Transformer Switch-XXL variant with 395B parameters on 1T token dataset,  A100 cluster: HDR IB network, H100 cluster: NDR IB network with NVLink Switch System where indicated.
GPT-3 175B 培训 A100 群组:HDR IB 网络,H100 集群:NDR IB 网络 | Mixture of Experts (MoE) 在 1T token 数据集上使用 395B 参数训练 Transformer Switch-XXL 变体,A100 集群:HDR IB 网络,H100 集群:NDR IB 网络与 NVLink 交换机系统(如有标示)。

Transformational AI Training
变革性人工智能培训

H100 features fourth-generation Tensor Cores and a Transformer Engine with FP8 precision that provides up to 4X faster training over the prior generation for GPT-3 (175B) models.
H100 采用第四代张量核和具有 FP8 精度的变压器引擎,与上一代产品相比,GPT-3 (175B) 模型的训练速度提高了 4 倍。

The combination of fourth-generation NVLink, which offers 900 gigabytes per second (GB/s) of GPU-to-GPU interconnect; NDR Quantum-2 InfiniBand networking, which accelerates communication by every GPU across nodes; PCIe Gen5; and NVIDIA Magnum IO™ software delivers efficient scalability from small enterprise systems to massive, unified GPU clusters.
第四代 NVLink(可提供每秒 900 千兆字节(GB/s)的 GPU 对 GPU 互连)、NDR Quantum-2 InfiniBand 网络(可加速每个 GPU 跨节点的通信)、PCIe Gen5 和 NVIDIA Magnum IO™ 软件的结合,可提供从小型企业系统到大规模统一 GPU 集群的高效可扩展性。

Deploying H100 GPUs at data center scale delivers outstanding performance and brings the next generation of exascale high-performance computing (HPC) and trillion-parameter AI within the reach of all researchers.
在数据中心规模部署 H100 GPU 可提供出色的性能,并为所有研究人员带来下一代超大规模高性能计算 (HPC) 和万亿参数人工智能。

Real-Time Deep Learning Inference
实时深度学习推理

AI solves a wide array of business challenges, using an equally wide array of neural networks. A great AI inference accelerator has to not only deliver the highest performance but also the versatility to accelerate these networks.
人工智能利用同样种类繁多的神经网络解决了各种各样的业务挑战。优秀的人工智能推理加速器不仅要提供最高的性能,还要具备加速这些网络的多功能性。

H100 extends NVIDIA’s market-leading inference leadership with several advancements that accelerate inference by up to 30X and deliver the lowest latency.
H100 凭借多项先进技术,将推理速度提高了 30 倍,并实现了最低延迟,从而扩大了 NVIDIA 在推理领域的市场领先地位。

Fourth-generation Tensor Cores speed up all precisions, including FP64, TF32, FP32, FP16, INT8, and now FP8, to reduce memory usage and increase performance while still maintaining accuracy for LLMs.
第四代张量内核加快了所有精度,包括 FP64、TF32、FP32、FP16、INT8 和现在的 FP8,从而在保持 LLMs 精度的同时,减少了内存使用量并提高了性能。

Up to 30X Higher AI Inference Performance on the Largest Models
最大模型的人工智能推理性能最多可提高 30 倍

Megatron chatbot inference (530 billion parameters)
威震天聊天机器人推理(5300 亿个参数)

Real-Time Deep Learning Inference

Projected performance subject to change. Inference on Megatron 530B parameter model based chatbot for input sequence length=128, output sequence length =20 | A100 cluster: HDR IB network | H100 cluster: NVLink Switch System, NDR IB
预计性能可能会有变化。基于 Megatron 530B 参数模型的聊天机器人推理,输入序列长度=128,输出序列长度=20 | A100 集群:HDR IB 网络 | H100 集群:NVLink 交换机系统,NDR IB

Up to 7X Higher Performance for HPC Applications
高性能计算应用的性能最高可提高 7 倍

AI-fused HPC Applications

Projected performance subject to change. 3D FFT (4K^3) throughput | A100 cluster: HDR IB network | H100 cluster: NVLink Switch System, NDR IB | Genome Sequencing (Smith-Waterman) | 1 A100 | 1 H100
预计性能可能会有变化。3D FFT (4K^3) 吞吐量 | A100 集群:HDR IB 网络 | H100 集群:NVLink 交换机系统,NDR IB | 基因组测序(史密斯-沃特曼) | 1 A100 | 1 H100

Exascale High-Performance Computing
超大规模高性能计算

The NVIDIA data center platform consistently delivers performance gains beyond Moore’s law.
英伟达™(NVIDIA®)数据中心平台不断带来超越摩尔定律的性能提升。

And H100’s new breakthrough AI capabilities further amplify the power of HPC+AI to accelerate time to discovery for scientists and researchers working on solving the world’s most important challenges.
H100 新的突破性人工智能功能进一步增强了 HPC+AI 的威力,为致力于解决全球最重要挑战的科学家和研究人员加快了发现成果的时间。

H100 triples the floating-point operations per second (FLOPS) of double-precision Tensor Cores, delivering 60 teraflops of FP64 computing for HPC.
H100 是双精度张量内核每秒浮点运算次数 (FLOPS) 的三倍,可为高性能计算提供 60 teraflops 的 FP64 计算能力。

AI-fused HPC applications can also leverage H100’s TF32 precision to achieve one petaflop of throughput for single-precision matrix-multiply operations, with zero code changes. 
人工智能融合的高性能计算应用还可以利用 H100 的 TF32 精度,在零代码改动的情况下,实现单精度矩阵乘法运算的 1 petaflop 吞吐量。

H100 also features new DPX instructions that deliver 7X higher performance over A100 and 40X speedups over CPUs on dynamic programming algorithms such as Smith-Waterman for DNA sequence alignment and protein alignment for protein structure prediction.
H100 还采用了新的 DPX 指令,在动态编程算法(如用于 DNA 序列比对的 Smith-Waterman 算法和用于蛋白质结构预测的蛋白质比对算法)方面,性能比 A100 提高了 7 倍,比 CPU 提高了 40 倍。

DPX instructions comparison NVIDIA HGX™ H100 4-GPU vs dual socket 32-core IceLake.
英伟达™(NVIDIA®)HGX™ H100 4-GPU 与双插槽 32 核 IceLake 的 DPX 指令比较。

Accelerated Data Analytics
加速数据分析

Data analytics often consumes the majority of time in AI application development. Since large datasets are scattered across multiple servers, scale-out solutions with commodity CPU-only servers get bogged down by a lack of scalable computing performance.
在人工智能应用程序开发过程中,数据分析往往会耗费大部分时间。由于大型数据集分散在多个服务器上,仅使用商品 CPU 服务器的横向扩展解决方案会因缺乏可扩展的计算性能而陷入困境。

Accelerated servers with H100 deliver the compute power—along with 3 terabytes per second (TB/s) of memory bandwidth per GPU and scalability with NVLink and NVSwitch™—to tackle data analytics with high performance and scale to support massive datasets.
配备 H100 的加速服务器可提供强大的计算能力、每 GPU 3 TB/s 的内存带宽以及 NVLink 和 NVSwitch™ 的可扩展性,从而以高性能和可扩展性处理数据分析,支持海量数据集。

Combined with NVIDIA Quantum-2 InfiniBand, Magnum IO software, GPU-accelerated Spark 3.0, and NVIDIA RAPIDS™, the NVIDIA data center platform is uniquely able to accelerate these huge workloads with higher performance and efficiency.
结合英伟达 Quantum-2 InfiniBand、Magnum IO 软件、GPU 加速的 Spark 3.0 和英伟达 RAPIDS™,英伟达数据中心平台能够以更高的性能和效率为这些庞大的工作负载提供独一无二的加速能力。

Accelerated servers with H100
NVIDIA Multi-Instance GPU

Enterprise-Ready Utilization
企业就绪利用

IT managers seek to maximize utilization (both peak and average) of compute resources in the data center. They often employ dynamic reconfiguration of compute to right-size resources for the workloads in use. 
IT 管理人员力求最大限度地利用数据中心的计算资源(包括峰值和平均值)。他们经常采用动态重新配置计算资源的方法,以便为使用中的工作负载调整资源的大小。

H100 with MIG lets infrastructure managers standardize their GPU-accelerated infrastructure while having the flexibility to provision GPU resources with greater granularity to securely provide developers the right amount of accelerated compute and optimize usage of all their GPU resources.
配备 MIG 的 H100 可让基础架构管理人员实现 GPU 加速基础架构的标准化,同时还能灵活地以更大的粒度调配 GPU 资源,从而安全地为开发人员提供适量的加速计算,并优化所有 GPU 资源的使用。

Built-In Confidential Computing
内置保密计算

Traditional Confidential Computing solutions are CPU-based, which is too limited for compute-intensive workloads such as AI at scale. NVIDIA Confidential Computing is a built-in security feature of the NVIDIA Hopper architecture that made H100 the world’s first accelerator with these capabilities. With NVIDIA Blackwell, the opportunity to exponentially increase performance while protecting the confidentiality and integrity of data and applications in use has the ability to unlock data insights like never before.
传统的机密计算解决方案都是基于 CPU 的,这对于大规模人工智能等计算密集型工作负载来说过于有限。英伟达™(NVIDIA®)机密计算是英伟达™(NVIDIA®)Hopper架构的一项内置安全功能,它使H100成为全球首款具备这些功能的加速器。有了英伟达™(NVIDIA®)Blackwell,在成倍提高性能的同时保护数据和应用的机密性和完整性,就有机会释放前所未有的数据洞察力。

Customers can now use a hardware-based trusted execution environment (TEE) that secures and isolates the entire workload in the most performant way.
客户现在可以使用基于硬件的可信执行环境(TEE),以最高效的方式保护和隔离整个工作负载。

NVIDIA Confidential Computing Solutions
NVIDIA Confidential Computing Solutions

Exceptional Performance for Large-Scale AI and HPC
为大规模人工智能和高性能计算提供卓越性能

The Hopper Tensor Core GPU will power the NVIDIA Grace Hopper CPU+GPU architecture, purpose-built for terabyte-scale accelerated computing and providing 10X higher performance on large-model AI and HPC.
Hopper Tensor Core GPU将为英伟达™(NVIDIA®)Grace Hopper CPU+GPU架构提供动力,该架构专为TB级加速计算而打造,可为大型人工智能和高性能计算提供10倍以上的性能。

The NVIDIA Grace CPU leverages the flexibility of the Arm® architecture to create a CPU and server architecture designed from the ground up for accelerated computing.
英伟达™(NVIDIA®)Grace CPU 充分利用了 Arm® 架构的灵活性,创建了一个从头开始为加速计算而设计的 CPU 和服务器架构。

The Hopper GPU is paired with the Grace CPU using NVIDIA’s ultra-fast chip-to-chip interconnect, delivering 900GB/s of bandwidth, 7X faster than PCIe Gen5.
Hopper GPU 与 Grace CPU 搭配使用英伟达™(NVIDIA®)超快的芯片到芯片互连技术,可提供每秒 900GB 的带宽,比 PCIe Gen5 快 7 倍。

This innovative design will deliver up to 30X higher aggregate system memory bandwidth to the GPU compared to today's fastest servers and up to 10X higher performance for applications running terabytes of data.
与当今最快的服务器相比,这一创新设计可为 GPU 提供高达 30 倍的系统内存总带宽,为运行 TB 级数据的应用提供高达 10 倍的性能。

Supercharge Large Language Model Inference with H100 NVL

Supercharge Large Language Model Inference With H100 NVL
利用 H100 NVL 强化大型语言模型推理

For LLMs up to 70 billion parameters (Llama 2 70B), the PCIe-based NVIDIA H100 NVL with NVLink bridge utilizes Transformer Engine, NVLink, and 188GB HBM3 memory to provide optimum performance and easy scaling across any data center, bringing LLMs to the mainstream.
对于 LLMs 高达 700 亿个参数(Llama 2 70B),基于 PCIe 的 NVIDIA H100 NVL 与 NVLink 桥接器利用变压器引擎、NVLink 和 188GB HBM3 内存提供最佳性能,并可在任何数据中心轻松扩展,从而将 LLMs 带入主流。

Servers equipped with H100 NVL GPUs increase Llama 2 70B performance up to 5X over NVIDIA A100 systems while maintaining low latency in power-constrained data center environments.
与英伟达™(NVIDIA®)A100 系统相比,配备 H100 NVL GPU 的服务器可将 Llama 2 70B 性能提高 5 倍,同时在电力紧张的数据中心环境中保持低延迟。

Enterprise-Ready: AI Software Streamlines Development and Deployment
企业就绪:人工智能软件简化开发和部署过程

NVIDIA H100 NVL comes with a five-year NVIDIA AI Enterprise subscription and simplifies the way you build an enterprise AI-ready platform. H100 accelerates AI development and deployment for production-ready generative AI solutions, including computer vision, speech AI, retrieval augmented generation (RAG), and more.
NVIDIA H100 NVL 随附为期五年的 NVIDIA AI Enterprise 订阅服务,可简化企业 AI 就绪平台的构建方式。H100 可为生产就绪的生成式人工智能解决方案加速人工智能开发和部署,包括计算机视觉、语音人工智能、检索增强生成(RAG)等。

NVIDIA AI Enterprise includes NVIDIA NIMTM, a set of easy-to-use microservices designed to speed up enterprise generative AI deployment. Together, deployments have enterprise-grade security, manageability, stability, and support.
NVIDIA AI Enterprise 包括 NVIDIA NIM TM ,这是一套易于使用的微服务,旨在加快企业生成式 AI 的部署。此外,部署还具有企业级的安全性、可管理性、稳定性和支持。

This results in performance-optimized AI solutions that deliver faster business value and actionable insights.
因此,性能优化的人工智能解决方案能够更快地实现业务价值和可操作的洞察力。

Product Specifications  产品规格

  H100 SXM H100 NVL
FP64 34 teraFLOPS 30 teraFLOPs
FP64 Tensor Core  FP64 张量核心 67 teraFLOPS 60 teraFLOPs
FP32 67 teraFLOPS 60 teraFLOPs
TF32 Tensor Core*
TF32 张量核 *
989 teraFLOPS 835 teraFLOPs
BFLOAT16 Tensor Core*
BFLOAT16 张量核心 *
1,979 teraFLOPS 1,671 teraFLOPS
FP16 Tensor Core*
FP16 张量核心 *
1,979 teraFLOPS 1,671 teraFLOPS
FP8 Tensor Core*
FP8 张量核心 *
3,958 teraFLOPS 3,341 teraFLOPS
INT8 Tensor Core*
INT8 张量核心 *
3,958 TOPS 3 958 TOPS 3,341 TOPS
GPU Memory  图形处理器内存 80GB 94GB
GPU Memory Bandwidth  GPU 内存带宽 3.35TB/s 3.9TB/s
Decoders  解码器 7 NVDEC
7 JPEG
7 NVDEC
7 JPEG
Max Thermal Design Power (TDP)
最大热设计功率 (TDP)
Up to 700W (configurable)
最高 700 瓦(可配置)
350-400W (configurable)  350-400 瓦(可配置)
Multi-Instance GPUs  多实例图形处理器 Up to 7 MIGS @ 10GB each
最多 7 个 MIGS,每个 10GB
Up to 7 MIGS @ 12GB each
最多 7 个 MIGS,每个 12GB
Form Factor  外形尺寸 SXM PCIe
dual-slot air-cooled 双槽风冷
Interconnect  互联 NVIDIA NVLink™: 900GB/s
英伟达™ NVLink™:900GB/秒

PCIe Gen5: 128GB/s
PCIe Gen5:128GB/秒
NVIDIA NVLink: 600GB/s
英伟达™(NVIDIA®)NVLink600GB/s

PCIe Gen5: 128GB/s PCIe Gen5:128GB/秒
Server Options  服务器选项 NVIDIA HGX H100 Partner and NVIDIA-
NVIDIA HGX H100 合作伙伴与 NVIDIA-

Certified Systems with 4 or 8 GPUs
配备 4 或 8 个 GPU 的认证系统

NVIDIA DGX H100 with 8 GPUs
配备 8 个 GPU 的英伟达 DGX H100
Partner and NVIDIA-Certified Systems with 1–8 GPUs
合作伙伴和英伟达™(NVIDIA®)认证系统,配备 1-8 个 GPU
NVIDIA AI Enterprise  英伟达人工智能企业 Add-on  附加组件 Included  包括

Take a deep dive into the NVIDIA Hopper architecture.
深入了解英伟达™(NVIDIA®)Hopper 架构。