What Is Retrieval-Augmented Generation, aka RAG?
检索增强生成,又称 RAG 是什么?

Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.
检索增强生成(RAG)是一种技术,通过从外部来源获取的事实来增强生成式人工智能模型的准确性和可靠性。
by Rick Merritt
2023 年 11 月 15 日,Rick Merritt 撰写

To understand the latest advance in generative AI, imagine a courtroom.
要理解生成式人工智能的最新进展,想象一下一个法庭。

Judges hear and decide cases based on their general understanding of the law. Sometimes a case — like a malpractice suit or a labor dispute — requires special expertise, so judges send court clerks to a law library, looking for precedents and specific cases they can cite.
法官根据他们对法律的一般理解来审理和判决案件。有时候,一些案件,比如医疗事故诉讼或劳资纠纷,需要特殊专业知识,因此法官会派法院书记员去法律图书馆查找可以引用的先例和具体案例。

Like a good judge, large language models (LLMs) can respond to a wide variety of human queries. But to deliver authoritative answers that cite sources, the model needs an assistant to do some research.
像一位优秀的法官一样,大型语言模型(LLMs)可以回答各种各样的人类查询。但要提供引用来源的权威答案,模型需要助手进行一些研究。

The court clerk of AI is a process called retrieval-augmented generation, or RAG for short.
AI 法院的法院书记员是一个名为检索增强生成的过程,简称 RAG。

How It Got Named ‘RAG’
它是如何被命名为“RAG”的

Patrick Lewis, lead author of the 2020 paper that coined the term, apologized for the unflattering acronym that now describes a growing family of methods across hundreds of papers and dozens of commercial services he believes represent the future of generative AI.
帕特里克·刘易斯是 2020 年论文的主要作者,该论文创造了这个术语。他为这个不太讨人喜欢的首字母缩写道歉,该缩写现在描述了跨越数百篇论文和数十种商业服务的一大家族方法,他认为这代表了生成式人工智能的未来。

Picture of Patrick Lewis, lead author of RAG paper
Patrick Lewis 帕特里克·刘易斯

“We definitely would have put more thought into the name had we known our work would become so widespread,” Lewis said in an interview from Singapore, where he was sharing his ideas with a regional conference of database developers.
“如果我们知道我们的工作会变得如此广泛,我们肯定会更深入地考虑名称的选择。”刘易斯在新加坡接受采访时说道,他正在那里与一群数据库开发者分享自己的想法。

“We always planned to have a nicer sounding name, but when it came time to write the paper, no one had a better idea,” said Lewis, who now leads a RAG team at AI startup Cohere.
“我们一直计划有一个听起来更好的名字,但当写论文的时候,没有人有更好的主意,”现在领导 AI 初创公司 Cohere 的 RAG 团队的 Lewis 说。

So, What Is Retrieval-Augmented Generation (RAG)?
那么,检索增强生成(RAG)是什么?

Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.
检索增强生成(RAG)是一种技术,通过从外部来源获取的事实来增强生成式人工智能模型的准确性和可靠性。

In other words, it fills a gap in how LLMs work. Under the hood, LLMs are neural networks, typically measured by how many parameters they contain. An LLM’s parameters essentially represent the general patterns of how humans use words to form sentences.
换句话说,它填补了LLMs工作中的一个空白。在幕后,LLMs是神经网络,通常通过它们包含的参数数量来衡量。一个LLM的参数基本上代表了人类如何使用单词来构成句子的一般模式。

That deep understanding, sometimes called parameterized knowledge, makes LLMs useful in responding to general prompts at light speed. However, it does not serve users who want a deeper dive into a current or more specific topic.
那种深入理解,有时被称为参数化知识,使LLMs在以光速回应一般提示时非常有用。然而,对于希望深入了解当前或更具体主题的用户来说,它并不适用。

Combining Internal, External Resources
结合内部和外部资源

Lewis and colleagues developed retrieval-augmented generation to link generative AI services to external resources, especially ones rich in the latest technical details.
Lewis 和他的同事们开发了检索增强生成技术,将生成式人工智能服务与外部资源链接起来,特别是那些丰富最新技术细节的资源。

The paper, with coauthors from the former Facebook AI Research (now Meta AI), University College London and New York University, called RAG “a general-purpose fine-tuning recipe” because it can be used by nearly any LLM to connect with practically any external resource.
这篇论文的合著者来自前 Facebook AI 研究部门(现在是 Meta AI)、伦敦大学学院和纽约大学,称 RAG 为“通用微调配方”,因为它可以被几乎任何LLM使用,以连接几乎任何外部资源。

Building User Trust 建立用户信任

Retrieval-augmented generation gives models sources they can cite, like footnotes in a research paper, so users can check any claims. That builds trust.
检索增强生成为模型提供了可以引用的来源,就像研究论文中的脚注一样,用户可以查证任何主张。这有助于建立信任。

What’s more, the technique can help models clear up ambiguity in a user query. It also reduces the possibility a model will make a wrong guess, a phenomenon sometimes called hallucination.
此外,这种技术可以帮助模型消除用户查询中的歧义。它还减少了模型猜错的可能性,有时被称为幻觉现象。

Another great advantage of RAG is it’s relatively easy. A blog by Lewis and three of the paper’s coauthors said developers can implement the process with as few as five lines of code.
RAG 的另一个巨大优势是它相对容易。刘易斯和该论文的另外三位合著者在一篇博客中表示,开发人员可以用仅五行代码实现该过程。

That makes the method faster and less expensive than retraining a model with additional datasets. And it lets users hot-swap new sources on the fly.
这使得该方法比使用额外数据集重新训练模型更快速、更经济。它还允许用户在运行时热插拔新数据源。

How People Are Using RAG
人们如何使用 RAG

With retrieval-augmented generation, users can essentially have conversations with data repositories, opening up new kinds of experiences. This means the applications for RAG could be multiple times the number of available datasets.
通过检索增强生成,用户基本上可以与数据存储库进行对话,开启新的体验。这意味着 RAG 的应用可能是可用数据集数量的多倍。

For example, a generative AI model supplemented with a medical index could be a great assistant for a doctor or nurse. Financial analysts would benefit from an assistant linked to market data.
例如,一个结合了医学指标的生成式人工智能模型可以成为医生或护士的很好助手。金融分析师将受益于与市场数据相关联的助手。

In fact, almost any business can turn its technical or policy manuals, videos or logs into resources called knowledge bases that can enhance LLMs. These sources can enable use cases such as customer or field support, employee training and developer productivity.
事实上,几乎任何企业都可以将其技术或政策手册、视频或日志转化为称为知识库的资源,从而增强LLMs。这些资源可以实现客户或现场支持、员工培训和开发人员生产力等用例。

The broad potential is why companies including AWS, IBM, Glean, Google, Microsoft, NVIDIA, Oracle and Pinecone are adopting RAG.
广泛的潜力是为什么包括 AWS、IBM、Glean、Google、Microsoft、NVIDIA、Oracle 和 Pinecone 在内的公司正在采用 RAG。

Getting Started With Retrieval-Augmented Generation 
检索增强生成入门

To help users get started, NVIDIA developed an AI workflow for retrieval-augmented generation. It includes a sample chatbot and the elements users need to create their own applications with this new method.
为了帮助用户入门,NVIDIA 开发了一种用于检索增强生成的 AI 工作流程。它包括一个示例聊天机器人和用户需要使用这种新方法创建自己的应用程序的元素。

The workflow uses NVIDIA NeMo, a framework for developing and customizing generative AI models, as well as software like NVIDIA Triton Inference Server and NVIDIA TensorRT-LLM for running generative AI models in production.
该工作流程使用 NVIDIA NeMo,这是一个用于开发和定制生成式人工智能模型的框架,以及诸如 NVIDIA Triton 推理服务器和 NVIDIA TensorRT-LLM等软件,用于在生产环境中运行生成式人工智能模型。

The software components are all part of NVIDIA AI Enterprise, a software platform that accelerates development and deployment of production-ready AI with the security, support and stability businesses need.
软件组件都是 NVIDIA AI Enterprise 的一部分,这是一个软件平台,可以加速开发和部署生产就绪的人工智能,提供企业所需的安全性、支持和稳定性。

Getting the best performance for RAG workflows requires massive amounts of memory and compute to move and process data. The NVIDIA GH200 Grace Hopper Superchip, with its 288GB of fast HBM3e memory and 8 petaflops of compute, is ideal — it can deliver a 150x speedup over using a CPU.
为了使 RAG 工作流程获得最佳性能,需要大量的内存和计算资源来移动和处理数据。拥有 288GB 快速 HBM3e 内存和 8 petaflops 计算能力的 NVIDIA GH200 Grace Hopper Superchip 是理想选择 — 它可以比使用 CPU 提供 150 倍的加速。

Once companies get familiar with RAG, they can combine a variety of off-the-shelf or custom LLMs with internal or external knowledge bases to create a wide range of assistants that help their employees and customers.
一旦公司熟悉了 RAG,他们可以将各种现成的或定制的LLMs与内部或外部知识库相结合,创建各种助手,帮助他们的员工和客户。

RAG doesn’t require a data center. LLMs are debuting on Windows PCs, thanks to NVIDIA software that enables all sorts of applications users can access even on their laptops.
RAG 不需要数据中心。LLMs首次登陆 Windows 个人电脑,这要归功于 NVIDIA 软件,使用户可以访问各种应用程序,即使是在他们的笔记本电脑上。

Chart shows running RAG on a PC
An example application for RAG on a PC.
RAG 在 PC 上的示例应用程序。

PCs equipped with NVIDIA RTX GPUs can now run some AI models locally. By using RAG on a PC, users can link to a private knowledge source – whether that be emails, notes or articles – to improve responses. The user can then feel confident that their data source, prompts and response all remain private and secure.
配备 NVIDIA RTX GPU 的个人电脑现在可以在本地运行一些 AI 模型。通过在个人电脑上使用 RAG,用户可以链接到私人知识源,无论是电子邮件、笔记还是文章,以改善响应。用户可以确信他们的数据源、提示和响应都保持私密和安全。

A recent blog provides an example of RAG accelerated by TensorRT-LLM for Windows to get better results fast.
最近的一篇博客提供了一个在 Windows 上通过 TensorRT-LLM加速的 RAG 示例,以快速获得更好的结果。

The History of RAG 
RAG 的历史

The roots of the technique go back at least to the early 1970s. That’s when researchers in information retrieval prototyped what they called question-answering systems, apps that use natural language processing (NLP) to access text, initially in narrow topics such as baseball.
该技术的根源可以追溯至至少上世纪 70 年代初。那时,信息检索领域的研究人员原型化了他们所称的问答系统,这些应用程序利用自然语言处理(NLP)来访问文本,最初仅限于狭窄主题,如棒球。

The concepts behind this kind of text mining have remained fairly constant over the years. But the machine learning engines driving them have grown significantly, increasing their usefulness and popularity.
这种文本挖掘背后的概念多年来基本保持不变。但驱动它们的机器学习引擎已经显著增长,提高了它们的实用性和受欢迎程度。

In the mid-1990s, the Ask Jeeves service, now Ask.com, popularized question answering with its mascot of a well-dressed valet. IBM’s Watson became a TV celebrity in 2011 when it handily beat two human champions on the Jeopardy! game show.
在上世纪 90 年代中期,Ask Jeeves 服务(现在的 Ask.com)通过其一名穿着得体的侍者吉祥物推广了问题解答。IBM 的沃森在 2011 年成为电视名人,当时它轻松击败了 Jeopardy!游戏节目上的两位人类冠军。

Picture of Ask Jeeves, an early RAG-like web service

Today, LLMs are taking question-answering systems to a whole new level.
今天,LLMs 正将问答系统推向一个全新的水平。

Insights From a London Lab
伦敦实验室的见解

The seminal 2020 paper arrived as Lewis was pursuing a doctorate in NLP at University College London and working for Meta at a new London AI lab. The team was searching for ways to pack more knowledge into an LLM’s parameters and using a benchmark it developed to measure its progress.
2020 年的重要论文是在 Lewis 在伦敦大学学院攻读自然语言处理博士学位并在 Meta 的伦敦新人工智能实验室工作时发布的。团队正在寻找方法来将更多知识装入LLM的参数,并使用他们开发的基准来衡量进展。

Building on earlier methods and inspired by a paper from Google researchers, the group “had this compelling vision of a trained system that had a retrieval index in the middle of it, so it could learn and generate any text output you wanted,” Lewis recalled.
借鉴早期方法,并受到谷歌研究人员一篇论文的启发,Lewis 回忆说,该团队“有这样一个引人注目的愿景,即训练有素的系统中间有一个检索索引,因此它可以学习并生成任何您想要的文本输出。”

Picture of IBM Watson winning on "Jeopardy" TV show, popularizing a RAG-like AI service
The IBM Watson question-answering system became a celebrity when it won big on the TV game show Jeopardy!
IBM 沃森问答系统在电视游戏节目《危险边缘》上大获成功后成为名人

When Lewis plugged into the work in progress a promising retrieval system from another Meta team, the first results were unexpectedly impressive.
当刘易斯将另一个 Meta 团队的一个有前途的检索系统插入正在进行的工作时,第一批结果出乎意料地令人印象深刻。

“I showed my supervisor and he said, ‘Whoa, take the win. This sort of thing doesn’t happen very often,’ because these workflows can be hard to set up correctly the first time,” he said.
“我向我的主管展示了,他说:‘哇,抓住这个机会。这种事情并不经常发生’,因为这些工作流程第一次正确设置起来可能会很困难,”他说。

Lewis also credits major contributions from team members Ethan Perez and Douwe Kiela, then of New York University and Facebook AI Research, respectively.
Lewis 还认为团队成员 Ethan Perez 和 Douwe Kiela 也做出了重大贡献,他们当时分别来自纽约大学和 Facebook AI 研究部门。

When complete, the work, which ran on a cluster of NVIDIA GPUs, showed how to make generative AI models more authoritative and trustworthy. It’s since been cited by hundreds of papers that amplified and extended the concepts in what continues to be an active area of research.
当完成时,这项工作在一组 NVIDIA GPU 上运行,展示了如何使生成式人工智能模型更加权威和可信。此后,数百篇论文引用了这项工作,并扩展了其中的概念,使其成为一个持续活跃的研究领域。

How Retrieval-Augmented Generation Works
检索增强生成的工作原理

At a high level, here’s how an NVIDIA technical brief describes the RAG process.
在较高层次上,这是 NVIDIA 技术简报描述 RAG 过程的方式。

When users ask an LLM a question, the AI model sends the query to another model that converts it into a numeric format so machines can read it. The numeric version of the query is sometimes called an embedding or a vector.
当用户向LLM提问时,AI 模型会将查询发送到另一个模型,将其转换为数字格式,以便机器能够读取。查询的数字版本有时被称为嵌入或向量。

NVIDIA diagram of how RAG works with LLMs
Retrieval-augmented generation combines LLMs with embedding models and vector databases.
检索增强生成将LLMs与嵌入模型和向量数据库相结合。

The embedding model then compares these numeric values to vectors in a machine-readable index of an available knowledge base. When it finds a match or multiple matches, it retrieves the related data, converts it to human-readable words and passes it back to the LLM.
嵌入模型然后将这些数值与可用知识库的机器可读索引中的向量进行比较。当它找到匹配项或多个匹配项时,它检索相关数据,将其转换为人类可读的词语,并将其传递回LLM。

Finally, the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user, potentially citing sources the embedding model found.
最后,LLM将检索到的单词与自己对查询的响应结合起来,形成最终答案呈现给用户,可能会引用嵌入模型发现的来源。

Keeping Sources Current 保持来源的最新状态

In the background, the embedding model continuously creates and updates machine-readable indices, sometimes called vector databases, for new and updated knowledge bases as they become available.
在后台,嵌入模型不断地为新的和更新的知识库创建和更新机器可读的索引,有时被称为向量数据库。

Chart of a RAG process described by LangChain
Retrieval-augmented generation combines LLMs with embedding models and vector databases.
检索增强生成将LLMs与嵌入模型和向量数据库相结合。

Many developers find LangChain, an open-source library, can be particularly useful in chaining together LLMs, embedding models and knowledge bases. NVIDIA uses LangChain in its reference architecture for retrieval-augmented generation.
许多开发人员发现 LangChain,一个开源库,在将LLMs、嵌入模型和知识库连接在一起方面特别有用。 NVIDIA 在其用于检索增强生成的参考架构中使用 LangChain。

The LangChain community provides its own description of a RAG process.
LangChain 社区提供了自己对 RAG 流程的描述。

Looking forward, the future of generative AI lies in creatively chaining all sorts of LLMs and knowledge bases together to create new kinds of assistants that deliver authoritative results users can verify.
展望未来,生成式人工智能的未来在于创造性地将各种LLMs和知识库相互链接,以创建新型助手,为用户提供可验证的权威结果。

Get a hands on using retrieval-augmented generation with an AI chatbot in this NVIDIA LaunchPad lab.
在这个 NVIDIA LaunchPad 实验室中,通过使用检索增强生成技术与 AI 聊天机器人亲自动手。

Explore generative AI sessions and experiences at NVIDIA GTC, the global conference on AI and accelerated computing, running March 18-21 in San Jose, Calif., and online.
探索 NVIDIA GTC 上的生成式人工智能会议和体验,这是一场关于人工智能和加速计算的全球性会议,将于 3 月 18 日至 21 日在加利福尼亚州圣何塞举行,同时也可在线参与。

Igniting the Future: TensorRT-LLM Release Accelerates AI Inference Performance, Adds Support for New Models Running on RTX-Powered Windows 11 PCs
点燃未来:TensorRT-LLM发布加速 AI 推理性能,增加对在搭载 RTX 的 Windows 11 PC 上运行的新模型的支持

New tools and resources announced at Microsoft Ignite include TensorRT-LLM wrapper for OpenAI Chat API, RTX-powered performance improvements to DirectML for Llama 2, other popular LLMs.
微软 Ignite 大会上宣布的新工具和资源包括 TensorRT-LLM包装器,用于 OpenAI Chat API,基于 RTX 的性能改进,用于 Llama 2 的 DirectML,以及其他热门LLMs。
by Jesse Clayton
2023 年 11 月 15 日,杰西·克莱顿撰写

Artificial intelligence on Windows 11 PCs marks a pivotal moment in tech history, revolutionizing experiences for gamers, creators, streamers, office workers, students and even casual PC users.
Windows 11 PC 上的人工智能标志着科技史上的一个关键时刻,为游戏玩家、创作者、直播者、办公人员、学生甚至休闲 PC 用户带来革命性体验。

It offers unprecedented opportunities to enhance productivity for users of the more than 100 million Windows PCs and workstations that are powered by RTX GPUs. And NVIDIA RTX technology is making it even easier for developers to create AI applications to change the way people use computers.
它为搭载 RTX GPU 的 1 亿多台 Windows PC 和工作站用户提供了前所未有的增强生产力的机会。而 NVIDIA RTX 技术使开发人员更容易地创建人工智能应用程序,改变人们使用计算机的方式。

New optimizations, models and resources announced at Microsoft Ignite will help developers deliver new end-user experiences, quicker.
微软 Ignite 大会上宣布的新优化、模型和资源将帮助开发人员更快地提供新的终端用户体验。

An upcoming update to TensorRT-LLM — open-source software that increases AI inference performance — will add support for new large language models and make demanding AI workloads more accessible on desktops and laptops with RTX GPUs starting at 8GB of VRAM.
即将推出的 TensorRT-LLM更新版——这是一款提高 AI 推理性能的开源软件——将支持新的大型语言模型,并使要求严格的 AI 工作负载更容易在配备 8GB VRAM 的 RTX GPU 的台式机和笔记本电脑上运行。

TensorRT-LLM for Windows will soon be compatible with OpenAI’s popular Chat API through a new wrapper. This will enable hundreds of developer projects and applications to run locally on a PC with RTX, instead of in the cloud — so users can keep private and proprietary data on Windows 11 PCs.
TensorRT-LLM for Windows 将很快通过新的包装器与 OpenAI 的热门 Chat API 兼容。这将使数百个开发者项目和应用程序能够在装有 RTX 的 PC 上本地运行,而不是在云端运行,从而使用户可以将私有和专有数据保存在 Windows 11 PC 上。

Custom generative AI requires time and energy to maintain projects. The process can become incredibly complex and time-consuming, especially when trying to collaborate and deploy across multiple environments and platforms.
定制生成式人工智能需要时间和精力来维护项目。这个过程可能变得非常复杂和耗时,特别是在尝试在多个环境和平台上进行协作和部署时。

AI Workbench is a unified, easy-to-use toolkit that allows developers to quickly create, test and customize pretrained generative AI models and LLMs on a PC or workstation. It provides developers a single platform to organize their AI projects and tune models to specific use cases.
AI Workbench 是一个统一的、易于使用的工具包,允许开发人员在 PC 或工作站上快速创建、测试和定制预训练生成式 AI 模型和LLMs。它为开发人员提供了一个单一平台,用于组织他们的 AI 项目并调整模型以适应特定用例。

This enables seamless collaboration and deployment for developers to create cost-effective, scalable generative AI models quickly. Join the early access list to be among the first to gain access to this growing initiative and to receive future updates.
这使开发人员能够无缝协作和部署,快速创建具有成本效益、可扩展的生成式人工智能模型。加入早期访问列表,成为首批获得这一不断发展的倡议访问权限并接收未来更新的人。

To support AI developers, NVIDIA and Microsoft will release DirectML enhancements to accelerate one of the most popular foundational AI models, Llama 2. Developers now have more options for cross-vendor deployment, in addition to setting a new standard for performance.
为了支持人工智能开发者,NVIDIA 和 Microsoft 将发布 DirectML 增强功能,加速最受欢迎的基础人工智能模型之一 Llama 2。开发者现在有更多的跨供应商部署选项,同时为性能设定了新标准。

Portable AI 便携式人工智能

Last month, NVIDIA announced TensorRT-LLM for Windows, a library for accelerating LLM inference.
上个月,NVIDIA 宣布了 TensorRT-LLM for Windows,这是一个加速 LLM 推断的库。

The next TensorRT-LLM release, v0.6.0 coming later this month, will bring improved inference performance — up to 5x faster — and enable support for additional popular LLMs, including the new Mistral 7B and Nemotron-3 8B. Versions of these LLMs will run on any GeForce RTX 30 Series and 40 Series GPU with 8GB of RAM or more, making fast, accurate, local LLM capabilities accessible even in some of the most portable Windows devices.
下一个 TensorRT-LLM版本,v0.6.0 将于本月晚些时候发布,将带来改进的推理性能 - 快至 5 倍 - 并支持额外的热门LLMs,包括新的 Mistral 7B 和 Nemotron-3 8B。这些LLMs的版本将在任何具有 8GB RAM 或更多的 GeForce RTX 30 系列和 40 系列 GPU 上运行,使得快速、准确、本地LLM功能即使在一些最便携的 Windows 设备上也能访问。

TensorRT-LLM V0.6 Windows Perf Chart
Up to 5X performance with the new TensorRT-LLM v0.6.0.
使用新的 TensorRT-LLM v0.6.0,性能提升高达 5 倍。

The new release of TensorRT-LLM will be available for install on the /NVIDIA/TensorRT-LLM GitHub repo. New optimized models will be available on ngc.nvidia.com.
TensorRT-LLM的新版本将可在/NVIDIA/TensorRT-LLM GitHub 存储库上安装。新的优化模型将在 ngc.nvidia.com 上提供。

Conversing With Confidence 
自信地交谈

Developers and enthusiasts worldwide use OpenAI’s Chat API for a wide range of applications — from summarizing web content and drafting documents and emails to analyzing and visualizing data and creating presentations.
开发人员和爱好者们在全球范围内使用 OpenAI 的 Chat API 进行各种应用 — 从总结网络内容、起草文件和电子邮件,到分析和可视化数据以及创建演示文稿。

One challenge with such cloud-based AIs is that they require users to upload their input data, making them impractical for private or proprietary data or for working with large datasets.
这种基于云的人工智能面临的一个挑战是,它们需要用户上传其输入数据,这使得它们对于私人或专有数据或处理大型数据集变得不切实际。

To address this challenge, NVIDIA is soon enabling TensorRT-LLM for Windows to offer a similar API interface to OpenAI’s widely popular ChatAPI, through a new wrapper, offering a similar workflow to developers whether they are designing models and applications to run locally on a PC with RTX or in the cloud. By changing just one or two lines of code, hundreds of AI-powered developer projects and applications can now benefit from fast, local AI. Users can keep their data on their PCs and not worry about uploading datasets to the cloud.
为了解决这一挑战,NVIDIA 即将在 Windows 上启用 TensorRT-LLM,通过一个新的包装器提供类似于 OpenAI 广受欢迎的 ChatAPI 的 API 接口,为开发人员提供类似的工作流程,无论他们是在本地 PC 上使用 RTX 设计模型和应用程序,还是在云端运行。通过仅更改一两行代码,数百个基于人工智能的开发人员项目和应用程序现在可以从快速的本地人工智能中受益。用户可以将数据保存在他们的 PC 上,而不必担心将数据集上传到云端。

Perhaps the best part is that many of these projects and applications are open source, making it easy for developers to leverage and extend their capabilities to fuel the adoption of generative AI on Windows, powered by RTX.
也许最好的部分是,这些项目和应用程序中的许多都是开源的,这使得开发人员能够轻松利用并扩展它们的功能,从而推动由 RTX 驱动的 Windows 上生成式人工智能的采用。

The wrapper will work with any LLM that’s been optimized for TensorRT-LLM (for example, Llama 2, Mistral and Nemotron-3 8B) and is being released as a reference project on GitHub, alongside other developer resources for working with LLMs on RTX.
包装器将与任何已经针对 TensorRT-LLM进行了优化的LLM一起工作(例如,Llama 2、Mistral 和 Nemotron-3 8B),并作为 GitHub 上的参考项目发布,同时还提供其他开发者资源,用于在 RTX 上使用LLMs。

Model Acceleration 模型加速

Developers can now leverage cutting-edge AI models and deploy with a cross-vendor API. As part of an ongoing commitment to empower developers, NVIDIA and Microsoft have been working together to accelerate Llama on RTX via the DirectML API.
开发人员现在可以利用尖端的人工智能模型,并通过跨供应商 API 进行部署。作为赋予开发人员权力的持续承诺的一部分,NVIDIA 和 Microsoft 一直在共同努力,通过 DirectML API 加速 RTX 上的 Llama。

Building on the announcements for the fastest inference performance for these models announced last month, this new option for cross-vendor deployment makes it easier than ever to bring AI capabilities to PC.
基于上个月宣布的这些模型的最快推理性能的公告,这个新的跨供应商部署选项使将人工智能能力带到个人电脑变得比以往任何时候都更容易。

Developers and enthusiasts can experience the latest optimizations by downloading the latest ONNX runtime and following the installation instructions from Microsoft, and installing the latest driver from NVIDIA, which will be available on Nov. 21.
开发人员和爱好者可以通过下载最新的 ONNX 运行时并按照微软的安装说明进行安装,然后安装来自 NVIDIA 的最新驱动程序来体验最新的优化,该驱动程序将于 11 月 21 日发布。

These new optimizations, models and resources will accelerate the development and deployment of AI features and applications to the 100 million RTX PCs worldwide, joining the more than 400 AI-powered apps and games already accelerated by RTX GPUs.
这些新的优化、模型和资源将加速 AI 功能和应用程序的开发和部署到全球 1 亿台 RTX PC 上,加入了已经由 RTX GPU 加速的 400 多个 AI 应用程序和游戏。

As models become even more accessible and developers bring more generative AI-powered functionality to RTX-powered Windows PCs, RTX GPUs will be critical for enabling users to take advantage of this powerful technology.
随着型号变得更加易获得,开发者将更多生成式人工智能功能带到搭载 RTX 的 Windows PC 上,RTX GPU 将至关重要,以使用户能够充分利用这一强大技术。

Explore generative AI sessions and experiences at NVIDIA GTC, the global conference on AI and accelerated computing, running March 18-21 in San Jose, Calif., and online.
探索 NVIDIA GTC 上的生成式人工智能会议和体验,这是一场关于人工智能和加速计算的全球性会议,将于 3 月 18 日至 21 日在加利福尼亚州圣何塞举行,同时也可在线参与。

Challenge Accepted: Animator Sir Wade Neistadt Leads Robotic Revolution in Record Time This Week ‘In the NVIDIA Studio’
挑战接受:动画师 Sir Wade Neistadt 在本周“在 NVIDIA Studio”中以创纪录的速度引领机器人革命

The Razer Blade 18 laptop powered by GeForce RTX 4090 graphics elevated his creative workflow.
雷蛇刀 18 笔记本电脑搭载 GeForce RTX 4090 显卡,提升了他的创作工作流程。
by Gerardo Delgado
2023 年 11 月 14 日,Gerardo Delgado 撰写

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.
编辑说明:本文是我们每周《NVIDIA Studio》系列的一部分,该系列旨在表彰特色艺术家,提供创意技巧和窍门,并展示 NVIDIA Studio 技术如何改善创意工作流程。我们还将深入探讨新的 GeForce RTX 40 系列 GPU 功能、技术和资源,以及它们如何显著加速内容创作。

Character animator Sir Wade Neistadt works to make animation and 3D education more accessible for aspiring and professional artists alike through video tutorials and industry training.
角色动画师韦德·奈斯塔特爵士通过视频教程和行业培训,致力于使动画和 3D 教育对有抱负的专业艺术家和专业人士更加易于接触。

The YouTube creator, who goes by Sir Wade, also likes a challenge. When electronics company Razer recently asked him to create something unique and creative using the new Razer Blade 18 laptop with GeForce RTX 4090 graphics, Sir Wade obliged.
这位名为 Sir Wade 的 YouTube 创作者也喜欢挑战。最近,电子公司 Razer 要求他使用配备 GeForce RTX 4090 显卡的全新 Razer Blade 18 笔记本电脑创作独特且富有创意的作品,Sir Wade 欣然接受了挑战。

“I said yes because I thought it’d be a great opportunity to try something creatively risky and make something I didn’t yet know how to achieve,” the artist said.
“我说‘是’,因为我认为这是一个尝试创造性冒险并制作我尚不知如何实现的东西的绝佳机会,”艺术家说道。

I, Robot 我,机器人

One of the hardest parts of getting started on a project is needing to be creative on demand, said Sir Wade. For the Razer piece, the animator started by asking himself two questions: “What am I inspired by?” and “What do I have to work with?”
“开始项目中最困难的部分之一是需要按需创造力,”Wade 爵士说。对于雷蛇的作品,动画师首先问自己两个问题:“我受到什么启发?”和“我有什么可以利用的?”

Sir Wade finds inspiration in games, technology, movies, people-watching and conversations. Fond of tech — and having eyed characters from the ProRigs library for some time — he decided his short animation should feature robots.
韦德先生从游戏、技术、电影、观察人群和交谈中获得灵感。他喜欢科技,并且一直关注 ProRigs 库中的角色,因此他决定他的短动画应该以机器人为特色。

When creating a concept for the animation, Sir Wade took an unorthodox approach, skipping the popular step of 2D sketching. Instead, he captured video references by acting out the animations himself.
在为动画创作概念时,韦德爵士采取了一种非正统的方法,跳过了流行的 2D 素描步骤。相反,他通过自己表演动画来捕捉视频参考。

This gave Sir Wade the opportunity to quickly try a bunch of movements and preview body mechanics for the animation phase. Since ProRigs characters are rigs based on Autodesk Maya, he naturally began his animation work using this 3D software.
这为韦德爵士提供了机会,可以快速尝试一系列动作,并预览动画阶段的身体力学。由于 ProRigs 角色是基于 Autodesk Maya 的绑定,他自然而然地开始使用这款 3D 软件进行动画工作。

“YOU SHALL NOT (RENDER) PASS.”
“你不得(通过)。”

His initial approach was straightforward: mimicking the main robot character’s movements with the edited reference footage. This worked fairly well, as NVIDIA RTX-accelerated ray tracing and AI denoising with the default Autodesk Arnold renderer resulted in smooth viewport movement and photorealistic visuals.
他最初的方法很直接:使用编辑过的参考素材模仿主要机器人角色的动作。这种方法效果相当不错,因为 NVIDIA RTX 加速的光线追踪和 AI 去噪与默认的 Autodesk Arnold 渲染器相结合,使视口移动流畅,视觉效果逼真。

Then, Sir Wade continued tinkering with the piece, focusing on how the robot’s arm plates crashed into each other and how its feet moved. This was a great challenge, but he kept moving on the project. The featured artist would advise, “Don’t wait for everything to be perfect.”
然后,韦德先生继续调整这件作品,专注于机器人的手臂板如何相互碰撞以及它的脚是如何移动的。这是一个巨大的挑战,但他继续推进这个项目。这位特色艺术家会建议:“不要等待一切都变得完美。”

The video reference footage captured earlier paid off later in Sir Wade’s creative workflow.
之前捕捉的视频参考素材在韦德爵士的创意工作流程中得到了回报。

Next, Sir Wade exported files into Blender software with the Universal Scene Description (OpenUSD) framework, unlocking an open and extensible ecosystem, including the ability to make edits in NVIDIA Omniverse, a development platform for building and connecting 3D tools and applications. The edits could then be captured in the original native files, eliminating the need for tedious uploading, downloading and file reformatting.
接下来,韦德先生使用通用场景描述(OpenUSD)框架将文件导入到 Blender 软件中,解锁了一个开放且可扩展的生态系统,包括在 NVIDIA Omniverse 中进行编辑的能力,这是一个用于构建和连接 3D 工具和应用程序的开发平台。然后可以在原始本地文件中捕获这些编辑,从而消除了繁琐的上传、下载和文件重新格式化的需要。

AI-powered RTX-accelerated OptiX ray tracing in the viewport allowed Sir Wade to manipulate the scene with ease.
AI 动力 RTX 加速的 OptiX 光线追踪在视口中允许韦德爵士轻松操纵场景。

Sir Wade browsed the Kitbash3D digital platform with the new asset browser Cargo to compile kits, models and materials, and drag them into Blender with ease. It’s important at this stage to get base-level models in the scene, he said, so the environment can be further refined.
韦德先生使用新的资产浏览器 Cargo 浏览 Kitbash3D 数字平台,编译套件、模型和材质,并轻松地将它们拖入 Blender。他说,在这个阶段将基础模型放入场景非常重要,这样环境才能进一步完善。

Dubbed the “ultimate desktop replacement,” the Razer Blade 18 offers NVIDIA GeForce RTX 4090 graphics.
被誉为“终极台式机替代品”的 Razer Blade 18 配备 NVIDIA GeForce RTX 4090 显卡。

Sir Wade raved about the Razer Blade 18’s quad-high-definition (QHD+) 18″ screen and 16:10 aspect ratio, which gives him more room to create, as well as its color-calibrated display, which ensures uploads to social media are as accurate as possible and require minimal color correction.
韦德先生对雷蛇刀锋 18 的四倍高清(QHD+)18 英寸屏幕和 16:10 宽高比赞不绝口,这为他提供了更多创作空间,同时其经过色彩校准的显示屏确保社交媒体上传的内容尽可能准确,减少了色彩校正的需求。

The preinstalled NVIDIA Studio Drivers, free to RTX GPU owners, are extensively tested with the most popular creative software to deliver maximum stability and performance.
预装的 NVIDIA Studio 驱动程序免费提供给 RTX GPU 所有者,经过广泛测试,与最流行的创意软件配合,以提供最大稳定性和性能。

“This is by far the best laptop I’ve ever used for this type of work.” — Sir Wade Neistadt
“这绝对是我用过的最好的笔记本电脑,适合这种工作。”— 韦德·奈斯塔特爵士

Returning to the action, Sir Wade used an emission shader to form the projectiles aimed at the robot. He also tweaked various textures, such as surface imperfections, to make the robot feel more weathered and battle-worn, before moving on to visual effects (VFX).
回到动作中,韦德爵士使用发射着色器来形成瞄准机器人的抛射物。他还调整了各种纹理,如表面瑕疵,使机器人看起来更加风化和战斗疲惫,然后转向视觉效果(VFX)。

The artist used basic primitives as particle emitters in Blender to achieve the look of bursting particles over a limited number of frames. This, combined with the robot and floor surfaces containing surface nodes, creates sparks when the robot moves or gets hit by objects.
艺术家在 Blender 中使用基本基元作为粒子发射器,以在有限数量的帧上实现爆炸颗粒的外观。这与机器人和地板表面包含表面节点相结合,当机器人移动或被物体击中时会产生火花。

Sir Wade’s GeForce RTX 4090 Laptop GPU with Blender Cycles RTX-accelerated OptiX ray tracing in the viewport provides interactive, photorealistic rendering for modeling and animation.
韦德爵士的 GeForce RTX 4090 笔记本电脑 GPU 配备 Blender Cycles RTX 加速的 OptiX 光线追踪,在视口中为建模和动画提供交互式、逼真的渲染。

Particle and collusion effects in Blender enable compelling VFX.
在 Blender 中,粒子和碰撞效果能够产生引人注目的视觉效果。

To further experiment with VFX, Sir Wade imported the project into the EmberGen simulation tool to test out various preset and physics effects.
为了进一步尝试 VFX,韦德爵士将项目导入 EmberGen 模拟工具,以测试各种预设和物理效果。

VFX in EmberGen. EmberGen 中的 VFX。

He added dust and debris VFX, and exported the scene as an OpenVDB file back to Blender to perfect the lighting.
他添加了灰尘和碎片的视觉效果,并将场景导出为 OpenVDB 文件,再回到 Blender 完善照明。

Final lighting elements in Blender.
在 Blender 中的最终照明元素。

“I chose an NVIDIA RTX GPU-powered system for its reliable speed, performance and stability, as I had a very limited window to complete this project.” — Sir Wade Neistadt
“我选择了一台由 NVIDIA RTX GPU 驱动的系统,因为我只有很有限的时间来完成这个项目,而这款系统速度快、性能稳定可靠。” — Sir Wade Neistadt

Finally, Sir Wade completed sound-design effects in Blackmagic Design’s DaVinci Resolve software.
最后,韦德先生在 Blackmagic Design 的 DaVinci Resolve 软件中完成了声音设计效果。

Sir Wade’s video tutorials resonate with diverse audiences because of their fresh approach to solving problems and individualistic flair.
韦德先生的视频教程因其独特的解决问题方法和个性化风格而深受各种观众喜爱。

“Creativity for me doesn’t come naturally like for other artists,” Sir Wade explained. “I reverse engineer the process by seeing a tool or a concept, evaluating what’s interesting, then either figuring out a way to use it uniquely or explaining the discovery in a relatable way.”
“对我来说,创造力并不像其他艺术家那样自然而然,”韦德爵士解释道。“我通过逆向工程的方式来进行创作,看到一个工具或概念,评估其中的有趣之处,然后要么想出独特的使用方式,要么以一种易于理解的方式解释这一发现。”

Sir Wade Neistadt. 尼斯塔特爵士。

Check out Sir Wade’s animation workshops on his website.
在他的网站上查看沃德爵士的动画工作坊。

Less than two days remain in Sir Wade’s Fall 2023 Animation Challenge. Download the challenge template and Maya character rig files, and submit a custom 3D scene to win an NVIDIA RTX GPU or other prizes by end of day on Wednesday, Nov. 15.
Sir Wade 的 2023 年秋季动画挑战仅剩不到两天。下载挑战模板和 Maya 角色绑定文件,并在 11 月 15 日星期三结束前提交定制的 3D 场景,赢取 NVIDIA RTX GPU 或其他奖品。

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 
在 Instagram、Twitter 和 Facebook 上关注 NVIDIA Studio。访问 Studio YouTube 频道上的教程,并通过订阅 Studio 通讯直接在收件箱中获取更新。

New Class of Accelerated, Efficient AI Systems Mark the Next Era of Supercomputing
新一代加速高效的人工智能系统标志着超级计算的下一个时代

Researchers worldwide will tackle grand challenges in science and industry with generative AI and HPC on systems packing the latest NVIDIA Hopper GPUs and NVIDIA Grace Hopper Superchips.
全球研究人员将利用搭载最新 NVIDIA Hopper GPU 和 NVIDIA Grace Hopper 超级芯片的系统,通过生成式人工智能和高性能计算来解决科学和工业领域的重大挑战。
by Rick Merritt
2023 年 11 月 13 日,Rick Merritt 撰写

Editor’s note: The name of the NVIDIA CUDA Quantum platform was changed to NVIDIA CUDA-Q in April 2024. All references to the name have been updated in this blog.
编辑说明:NVIDIA CUDA Quantum 平台的名称于 2024 年 4 月更改为 NVIDIA CUDA-Q。本博客中所有对该名称的引用已经更新。

NVIDIA today unveiled at SC23 the next wave of technologies that will lift scientific and industrial research centers worldwide to new levels of performance and energy efficiency.
NVIDIA 今天在 SC23 揭示了下一波技术,将全球科学和工业研究中心的性能和能效提升到新的水平。

“NVIDIA hardware and software innovations are creating a new class of AI supercomputers,” said Ian Buck, vice president of the company’s high performance computing and hyperscale data center business, in a special address at the conference.
“NVIDIA 硬件和软件创新正在打造一类新型的 AI 超级计算机,”公司高性能计算和超大规模数据中心业务副总裁 Ian Buck 在会议上的特别演讲中表示。

Some of the systems will pack memory-enhanced NVIDIA Hopper accelerators, others a new NVIDIA Grace Hopper systems architecture. All will use the expanded parallelism to run a full stack of accelerated software for generative AI, HPC and hybrid quantum computing.
一些系统将搭载内存增强的 NVIDIA Hopper 加速器,另一些将采用新的 NVIDIA Grace Hopper 系统架构。所有系统将利用扩展的并行性来运行一整套用于生成式人工智能、高性能计算和混合量子计算的加速软件。

Buck described the new NVIDIA HGX H200 as “the world’s leading AI computing platform.”
Buck 将新的 NVIDIA HGX H200 描述为“全球领先的人工智能计算平台”。

Image of H200 GPU system
NVIDIA H200 Tensor Core GPUs pack HBM3e memory to run growing generative AI models.
NVIDIA H200 Tensor Core GPU 搭载 HBM3e 内存,用于运行不断增长的生成式人工智能模型。

It packs up to 141GB of HBM3e, the first AI accelerator to use the ultrafast technology. Running models like GPT-3, NVIDIA H200 Tensor Core GPUs provide an 18x performance increase over prior-generation accelerators.
它搭载高达 141GB 的 HBM3e,是第一款使用超高速技术的 AI 加速器。像 GPT-3 这样的模型运行时,NVIDIA H200 Tensor Core GPU 相较上一代加速器提供了 18 倍的性能提升。

Among other generative AI benchmarks, they zip through 12,000 tokens per second on a Llama2-13B large language model (LLM).
在其他生成式 AI 基准测试中,它们在 Llama2-13B 大型语言模型上每秒处理 12,000 个标记(LLM)。

Buck also revealed a server platform that links four NVIDIA GH200 Grace Hopper Superchips on an NVIDIA NVLink interconnect. The quad configuration puts in a single compute node a whopping 288 Arm Neoverse cores and 16 petaflops of AI performance with up to 2.3 terabytes of high-speed memory.
Buck 还透露了一个服务器平台,该平台通过 NVIDIA NVLink 互连将四个 NVIDIA GH200 Grace Hopper Superchips 连接起来。四重配置在单个计算节点中放置了 288 个 Arm Neoverse 核心和 16 petaflops 的人工智能性能,高达 2.3TB 的高速内存。

Image of quad GH200 server node
Server nodes based on the four GH200 Superchips will deliver 16 petaflops of AI performance.
基于四个 GH200 Superchips 的服务器节点将提供 16 petaflops 的人工智能性能。

Demonstrating its efficiency, one GH200 Superchip using the NVIDIA TensorRT-LLM open-source library is 100x faster than a dual-socket x86 CPU system and nearly 2x more energy efficient than an X86 + H100 GPU server.
展示其效率,使用 NVIDIA TensorRT-LLM开源库的一个 GH200 Superchip 比双插槽 x86 CPU 系统快 100 倍,几乎比 X86 + H100 GPU 服务器节能 2 倍。

“Accelerated computing is sustainable computing,” Buck said. “By harnessing the power of accelerated computing and generative AI, together we can drive innovation across industries while reducing our impact on the environment.”
“加速计算是可持续计算,”Buck 说。“通过利用加速计算和生成式人工智能的力量,我们可以共同推动跨行业创新,同时减少对环境的影响。”

NVIDIA Powers 38 of 49 New TOP500 Systems
NVIDIA 为 49 个新 TOP500 系统中的 38 个提供动力

The latest TOP500 list of the world’s fastest supercomputers reflects the shift toward accelerated, energy-efficient supercomputing.
全球最新的 TOP500 超级计算机排行榜反映了向加速、节能的超级计算转变。

Thanks to new systems powered by NVIDIA H100 Tensor Core GPUs, NVIDIA now delivers more than 2.5 exaflops of HPC performance across these world-leading systems, up from 1.6 exaflops in the May rankings. NVIDIA’s contribution on the top 10 alone reaches nearly an exaflop of HPC and 72 exaflops of AI performance.
由 NVIDIA H100 Tensor Core GPU 提供动力的新系统,NVIDIA 现在在这些世界领先的系统中提供超过 2.5 艾克斯弗洛普的高性能计算性能,而五月排名中仅为 1.6 艾克斯弗洛普。仅在前十名中,NVIDIA 的贡献就接近一艾克斯弗洛普的高性能计算性能和 72 艾克斯弗洛普的人工智能性能。

The new list contains the highest number of systems ever using NVIDIA technologies, 379 vs. 372 in May, including 38 of 49 new supercomputers on the list.
新列表包含有史以来使用 NVIDIA 技术最多的系统,379 台,比 5 月的 372 台多,其中包括列表中 49 台新超级计算机中的 38 台。

Microsoft Azure leads the newcomers with its Eagle system using H100 GPUs in NDv5 instances to hit No. 3 with 561 petaflops. Mare Nostrum5 in Barcelona ranked No. 8, and NVIDIA Eos — which recently set new AI training records on the MLPerf benchmarks — came in at No. 9.
微软 Azure 以其 Eagle 系统在 NDv5 实例中使用 H100 GPU 以 561 petaflops 的性能位列第三。巴塞罗那的 Mare Nostrum5 排名第 8,而最近在 MLPerf 基准测试上刷新了新的 AI 训练记录的 NVIDIA Eos 排名第 9。

Showing their energy efficiency, NVIDIA GPUs power 24 of the top 30 systems on the Green500. And they retained the No. 1 spot with the H100 GPU-based Henri system, which delivers 65.09 gigaflops per watt for the Flatiron Institute in New York.
展示其能效,NVIDIA GPU 为 Green500 榜单前 30 名系统中的 24 个提供动力。他们凭借基于 H100 GPU 的 Henri 系统保持第一名,为纽约 Flatiron 研究所提供每瓦 65.09 吉浮点运算。

Gen AI Explores COVID Gen AI 探索 COVID

Showing what’s possible, the Argonne National Laboratory used NVIDIA BioNeMo, a generative AI platform for biomolecular LLMs, to develop GenSLMs, a model that can generate gene sequences that closely resemble real-world variants of the coronavirus. Using NVIDIA GPUs and data from 1.5 million COVID genome sequences, it can also rapidly identify new virus variants.
展示了可能性,阿贡国家实验室使用了 NVIDIA BioNeMo,这是一种用于生物分子LLMs的生成式 AI 平台,开发了 GenSLMs,这是一个可以生成与冠状病毒真实变种密切相似的基因序列的模型。利用 NVIDIA GPU 和来自 150 万 COVID 基因组序列的数据,它还可以快速识别新的病毒变种。

The work won the Gordon Bell special prize last year and was trained on supercomputers, including Argonne’s Polaris system, the U.S. Department of Energy’s Perlmutter and NVIDIA’s Selene.
去年这项工作赢得了戈登·贝尔特别奖,并在包括阿贡国家实验室的极光系统、美国能源部的 Perlmutter 和 NVIDIA 的 Selene 在内的超级计算机上进行了训练。

It’s “just the tip of the iceberg — the future is brimming with possibilities, as generative AI continues to redefine the landscape of scientific exploration,” said Kimberly Powell, vice president of healthcare at NVIDIA, in the special address.
“这只是冰山一角 - 随着生成式人工智能继续重新定义科学探索的格局,未来充满了可能性,” NVIDIA 医疗副总裁金伯利·鲍威尔在特别演讲中表示。

Saving Time, Money and Energy
节约时间、金钱和能源

Using the latest technologies, accelerated workloads can see an order-of-magnitude reduction in system cost and energy used, Buck said.
Buck 表示,利用最新技术,加速工作负载可以看到系统成本和能源使用量降低一个数量级。

For example, Siemens teamed with Mercedes to analyze aerodynamics and related acoustics for its new electric EQE vehicles. The simulations that took weeks on CPU clusters ran significantly faster using the latest NVIDIA H100 GPUs. In addition, Hopper GPUs let them reduce costs by 3x and reduce energy consumption by 4x (below).
例如,西门子与梅赛德斯合作,为其新的电动 EQE 车辆分析空气动力学和相关声学。在 CPU 集群上花费数周的模拟运行速度大大加快,使用最新的 NVIDIA H100 GPU。此外,霍普 GPU 让他们将成本降低了 3 倍,能源消耗降低了 4 倍(下文)。

Chart showing the performance and energy efficiency of H100 GPUs

Switching on 200 Exaflops Beginning Next Year
明年开始启动 200 艾克赫夫的开关

Scientific and industrial advances will come from every corner of the globe where the latest systems are being deployed.
科学和工业的进步将来自全球各个角落,最新系统正在部署的地方。

“We already see a combined 200 exaflops of AI on Grace Hopper supercomputers going to production 2024,” Buck said.
“巴克说:“我们已经看到格雷斯·霍珀超级计算机上的 200 艾克斯弗洛普的人工智能将于 2024 年投入生产。”

They include the massive JUPITER supercomputer at Germany’s Jülich center. It can deliver 93 exaflops of performance for AI training and 1 exaflop for HPC applications, while consuming only 18.2 megawatts of power.
它们包括德国 Jülich 中心的庞大 JUPITER 超级计算机。它可以为人工智能训练提供 93 艾克斯弗洛普的性能,为高性能计算应用提供 1 艾克斯弗洛普的性能,同时仅消耗 18.2 兆瓦的电力。

Chart of deployed performance of supercomputers using NVIDIA GPUs through 2024
Research centers are poised to switch on a tsunami of GH200 performance.
研究中心准备启动 GH200 性能的海啸。

Based on Eviden’s BullSequana XH3000 liquid-cooled system, JUPITER will use the NVIDIA quad GH200 system architecture and NVIDIA Quantum-2 InfiniBand networking for climate and weather predictions, drug discovery, hybrid quantum computing and digital twins. JUPITER quad GH200 nodes will be configured with 864GB of high-speed memory.
基于 Eviden 的 BullSequana XH3000 液冷系统,JUPITER 将采用 NVIDIA 四路 GH200 系统架构和 NVIDIA Quantum-2 InfiniBand 网络,用于气候和天气预测、药物发现、混合量子计算和数字孪生。JUPITER 四路 GH200 节点将配置 864GB 高速内存。

It’s one of several new supercomputers using Grace Hopper that NVIDIA announced at SC23.
这是 NVIDIA 在 SC23 上宣布的几台使用 Grace Hopper 的新超级计算机之一。

The HPE Cray EX2500 system from Hewlett Packard Enterprise will use the quad GH200 to power many AI supercomputers coming online next year.
惠普企业的 HPE Cray EX2500 系统将使用四路 GH200 来为明年上线的许多人工智能超级计算机提供动力。

For example, HPE uses the quad GH200 to power the DeltaAI system, which will triple computing capacity for the U.S. National Center for Supercomputing Applications.
例如,HPE 使用四路 GH200 来为 DeltaAI 系统提供动力,这将使美国国家超级计算应用中心的计算能力增加三倍。

HPE is also building the Venado system for the Los Alamos National Laboratory, the first GH200 to be deployed in the U.S. In addition, HPE is building GH200 supercomputers in the Middle East, Switzerland and the U.K.
HPE 还为洛斯阿拉莫斯国家实验室建造 Venado 系统,这是在美国部署的第一台 GH200。此外,HPE 还在中东、瑞士和英国建造 GH200 超级计算机。

Separately, Fujitsu will use the GH200 manufactured by Supermicro in the OFP-II system, an advanced HPC system in Japan shared by the University of Tsukuba and the University of Tokyo.
另外,富士通将在日本筑波大学和东京大学共享的 OFP-II 系统中使用 Supermicro 制造的 GH200,这是一种先进的 HPC 系统。

Grace Hopper in Texas and Beyond
格雷斯·霍珀在德克萨斯州及其他地方

At the Texas Advanced Computing Center (TACC), Dell Technologies is building the Vista supercomputer with NVIDIA Grace Hopper and Grace CPU Superchips. 

More than 100 global enterprises and organizations, including NASA Ames Research Center and Total Energies, have already purchased Grace Hopper early-access systems, Buck said. 

They join previously announced GH200 users such as SoftBank and the University of Bristol, as well as the massive Leonardo system with 14,000 NVIDIA A100 GPUs that delivers 10 exaflops of AI performance for Italy’s Cineca consortium. 

The View From Supercomputing Centers 

Leaders from supercomputing centers around the world shared their plans and work in progress with the latest systems. 

“We’ve been collaborating with MeteoSwiss and ECMWF as well as scientists from ETH EXCLAIM and NVIDIA’s Earth-2 project to create an infrastructure that will push the envelope in all dimensions of big data analytics and extreme scale computing,” said Thomas Schultess, director of the Swiss National Supercomputing Centre of work on the Alps supercomputer. 

“There’s really impressive energy-efficiency gains across our stacks,” Dan Stanzione, executive director of TACC, said of Vista. 

It’s “really the stepping stone to move users from the kinds of systems we’ve done in the past to looking at this new Grace Arm CPU and Hopper GPU tightly coupled combination and … we’re looking to scale out by probably a factor of 10 or 15 from what we are deploying with Vista when we deploy Horizon in a couple years,” he said. 

Accelerating the Quantum Journey 

Researchers are also using today’s accelerated systems to pioneer a path to tomorrow’s supercomputers. 

In Germany, JUPITER “will revolutionize scientific research across climate, materials, drug discovery and quantum computing,” said Kristel Michelson, who leads Julich’s research group on quantum information processing. 

“JUPITER’s architecture also allows for the seamless integration of quantum algorithms with parallel HPC algorithms, and this is mandatory for effective quantum HPC hybrid simulations,” she said. 

CUDA-Q Drives Progress 

The special address also showed how NVIDIA CUDA-Q — a platform for programming CPUs, GPUs and quantum computers also known as QPUs — is advancing research in quantum computing. 

For example, researchers at BASF, the world’s largest chemical company, pioneered a new hybrid quantum-classical method for simulating chemicals that can shield humans against harmful metals. They join researchers at Brookhaven National Laboratory and HPE who are separately pushing the frontiers of science with CUDA-Q. 

NVIDIA also announced a collaboration with Classiq, a developer of quantum programming tools, to create a life sciences research center at the Tel Aviv Sourasky Medical Center, Israel’s largest teaching hospital.  The center will use Classiq’s software and CUDA-Q running on an NVIDIA DGX H100 system. 

Separately, Quantum Machines will deploy the first NVIDIA DGX Quantum, a system using Grace Hopper Superchips, at the Israel National Quantum Center that aims to drive advances across scientific fields. The DGX system will be connected to a superconducting QPU by Quantware and a photonic QPU from ORCA Computing, both powered by CUDA-Q. 

Logos of NVIDIA CUDA Quantum partners

“In just two years, our NVIDIA quantum computing platform has amassed over 120 partners [above], a testament to its open, innovative platform,” Buck said.
“仅仅两年时间,我们的 NVIDIA 量子计算平台已经吸引了超过 120 个合作伙伴,这证明了它是一个开放、创新的平台。”Buck 说道。

Overall, the work across many fields of discovery reveals a new trend that combines accelerated computing at data center scale with NVIDIA’s full-stack innovation.
总的来说,许多领域的工作揭示了一个新趋势,即将数据中心规模的加速计算与英伟达的全栈创新相结合。

“Accelerated computing is paving the path for sustainable computing with advancements that provide not just amazing technology but a more sustainable and impactful future,” he concluded.
“加速计算正在为可持续计算铺平道路,通过提供不仅是令人惊叹的技术,而且是更可持续和有影响力的未来,”他总结道。

Watch NVIDIA’s SC23 special address below.
观看 NVIDIA 的 SC23 特别演讲,请查看以下视频。

Explore generative AI sessions and experiences at NVIDIA GTC, the global conference on AI and accelerated computing, running March 18-21 in San Jose, Calif., and online.
探索 NVIDIA GTC 上的生成式人工智能会议和体验,这是一场关于人工智能和加速计算的全球性会议,将于 3 月 18 日至 21 日在加利福尼亚州圣何塞举行,同时也可在线参与。

Feature image: Jupiter system at Jülich supercomputer center, courtesy of Eviden.
特色图片:由 Eviden 提供的 Jülich 超级计算机中的朱庇特系统。

Gen AI for the Genome: LLM Predicts Characteristics of COVID Variants 

A new demo lets users explore visualizations of the genome-scale language model by Argonne National Laboratory, NVIDIA and other collaborators.  
by Isha Salian 

A widely acclaimed large language model for genomic data has demonstrated its ability to generate gene sequences that closely resemble real-world variants of SARS-CoV-2, the virus behind COVID-19. 

Called GenSLMs, the model, which last year won the Gordon Bell special prize for high performance computing-based COVID-19 research, was trained on a dataset of nucleotide sequences — the building blocks of DNA and RNA. It was developed by researchers from Argonne National Laboratory, NVIDIA, the University of Chicago and a score of other academic and commercial collaborators. 

When the researchers looked back at the nucleotide sequences generated by GenSLMs, they discovered that specific characteristics of the AI-generated sequences closely matched the real-world Eris and Pirola subvariants that have been prevalent this year — even though the AI was only trained on COVID-19 virus genomes from the first year of the pandemic. 

“Our model’s generative process is extremely naive, lacking any specific information or constraints around what a new COVID variant should look like,” said Arvind Ramanathan, lead researcher on the project and a computational biologist at Argonne. “The AI’s ability to predict the kinds of gene mutations present in recent COVID strains — despite having only seen the Alpha and Beta variants during training — is a strong validation of its capabilities.” 

In addition to generating its own sequences, GenSLMs can also classify and cluster different COVID genome sequences by distinguishing between variants. In a demo available on NGC, NVIDIA’s hub for accelerated software, users can explore visualizations of GenSLMs’ analysis of the evolutionary patterns of various proteins within the COVID viral genome. 

 

Reading Between the Lines, Uncovering Evolutionary Patterns
读懂行间,揭示进化模式

A key feature of GenSLMs is its ability to interpret long strings of nucleotides — represented with sequences of the letters A, T, G and C in DNA, or A, U, G and C in RNA — in the same way an LLM trained on English text would interpret a sentence. This capability enables the model to understand the relationship between different areas of the genome, which in coronaviruses consists of around 30,000 nucleotides.
GenSLMs 的一个关键特征是它能够解释长串核苷酸 — 在 DNA 中用 A、T、G 和 C 的序列表示,或在 RNA 中用 A、U、G 和 C 表示 — 就像一个在英文文本上训练过的模型会解释一句话一样。这种能力使模型能够理解基因组不同区域之间的关系,冠状病毒的基因组大约由 30,000 个核苷酸组成。

In the NGC demo, users can choose from among eight different COVID variants to understand how the AI model tracks mutations across various proteins of the viral genome. The visualization depicts evolutionary couplings across the viral proteins — highlighting which snippets of the genome are likely to be seen in a given variant.
在 NGC 演示中,用户可以从八种不同的 COVID 变体中进行选择,以了解 AI 模型如何跟踪病毒基因组中各种蛋白质的突变。可视化展示了病毒蛋白质之间的进化耦合,突出显示了基因组中哪些片段可能出现在特定变体中。

“Understanding how different parts of the genome are co-evolving gives us clues about how the virus may develop new vulnerabilities or new forms of resistance,” Ramanathan said. “Looking at the model’s understanding of which mutations are particularly strong in a variant may help scientists with downstream tasks like determining how a specific strain can evade the human immune system.”
“了解基因组的不同部分如何共同进化,可以为我们提供关于病毒可能如何发展新的易感性或新的抗性形式的线索,” Ramanathan 说。“观察模型对哪些突变在变种中特别强大的理解,可能有助于科学家进行下游任务,比如确定特定菌株如何逃避人类免疫系统。”

 

GenSLMs was trained on more than 110 million prokaryotic genome sequences and fine-tuned with a global dataset of around 1.5 million COVID viral sequences using open-source data from the Bacterial and Viral Bioinformatics Resource Center. In the future, the model could be fine-tuned on the genomes of other viruses or bacteria, enabling new research applications.
GenSLMs 在超过 1.1 亿个原核基因组序列上进行了训练,并利用来自细菌和病毒生物信息资源中心的开源数据,对大约 150 万个 COVID 病毒序列进行了微调。将来,该模型可以在其他病毒或细菌的基因组上进行微调,从而实现新的研究应用。

To train the model, the researchers used NVIDIA A100 Tensor Core GPU-powered supercomputers, including Argonne’s Polaris system, the U.S. Department of Energy’s Perlmutter and NVIDIA’s Selene.
为了训练模型,研究人员使用了由 NVIDIA A100 Tensor Core GPU 提供动力的超级计算机,包括 Argonne 的 Polaris 系统、美国能源部的 Perlmutter 和 NVIDIA 的 Selene。

The GenSLMs research team’s Gordon Bell special prize was awarded at last year’s SC22 supercomputing conference. At this week’s SC23, in Denver, NVIDIA is sharing a new range of groundbreaking work in the field of accelerated computing. View the full schedule and catch the replay of NVIDIA’s special address below.
GenSLMs 研究团队的 Gordon Bell 特别奖是在去年的 SC22 超级计算会议上获得的。在本周的 SC23 中,NVIDIA 正在分享加速计算领域的一系列开创性工作。查看完整日程表,并在下方观看 NVIDIA 的特别演讲回放。

NVIDIA Research comprises hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics. Learn more about NVIDIA Research and subscribe to NVIDIA healthcare news.
NVIDIA 研究团队由全球数百名科学家和工程师组成,专注于人工智能、计算机图形学、计算机视觉、自动驾驶汽车和机器人等领域。了解更多关于 NVIDIA 研究的信息,并订阅 NVIDIA 医疗新闻。

Main image courtesy of Argonne National Laboratory’s Bharat Kale. 
主要图片由阿贡国家实验室的 Bharat Kale 提供。

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. DOE Office of Science and the National Nuclear Security Administration. Research was supported by the DOE through the National Virtual Biotechnology Laboratory, a consortium of DOE national laboratories focused on response to COVID-19, with funding from the Coronavirus CARES Act. 

Researchers Poised for Advances With NVIDIA CUDA-Q
研究人员准备利用 NVIDIA CUDA-Q 取得进展

The world’s largest chemical company is among many organizations seeking insights with hybrid quantum computing on NVIDIA software and GPUs.
世界上最大的化学公司是众多机构之一,正在寻求通过 NVIDIA 软件和 GPU 在混合量子计算上获得洞见。
by Sam Stanwyck
2023 年 11 月 13 日,Sam Stanwyck 撰写

Editor’s note: All references to the name of the NVIDIA CUDA Quantum platform were changed to NVIDIA CUDA-Q in April 2024. 
编辑说明:2024 年 4 月,将 NVIDIA CUDA Quantum 平台的所有名称引用更改为 NVIDIA CUDA-Q。

Michael Kuehn and Davide Vodola are taking to new heights work that’s pioneering quantum computing for the world’s largest chemical company.
迈克尔·库恩(Michael Kuehn)和达维德·沃多拉(Davide Vodola)正在为全球最大化学公司开拓量子计算的工作推向新高度。

The BASF researchers are demonstrating how a quantum algorithm can see what no traditional simulation can — key attributes of NTA, a compound with applications that include removing toxic metals like iron from a city’s wastewater.
巴斯夫的研究人员正在展示量子算法如何能够看到传统模拟无法看到的东西 - NTA 的关键属性,这是一种化合物,可用于从城市污水中去除像铁这样的有毒金属。

The quantum computing team at BASF simulated on GPUs how the equivalent of 24 qubits — the processing engines of a quantum computer — can tackle the challenge.
巴斯夫的量子计算团队在 GPU 上模拟了相当于 24 个量子比特(量子计算机的处理引擎)如何应对挑战。

Many corporate R&D centers would consider that a major achievement, but they pressed on, and recently ran their first 60 qubit simulations on NVIDIA’s Eos H100 Supercomputer.
许多公司的研发中心可能会认为这是一个重大成就,但他们继续努力,并最近在英伟达的 Eos H100 超级计算机上进行了首次 60 量子比特的模拟。

“It’s the largest simulation of a molecule using a quantum algorithm we’ve ever run,” said Kuehn.
“这是我们迄今为止运行的最大量子算法模拟分子。”Kuehn 说道。

Flexible, Friendly Software
灵活、友好的软件

BASF is running the simulation on NVIDIA CUDA-Q, a platform for programming CPUs, GPUs and quantum computers, also known as QPUs.
巴斯夫正在运行在 NVIDIA CUDA-Q 平台上的模拟,这是一个用于编程 CPU、GPU 和量子计算机(也称为 QPU)的平台。

Vodola described it as “very flexible and user friendly, letting us build up a complex quantum circuit simulation from relatively simple building blocks. Without CUDA-Q, it would be impossible to run this simulation,” he said.
Vodola 将其描述为“非常灵活和用户友好,让我们可以从相对简单的构建模块中构建复杂的量子电路模拟。没有 CUDA-Q,运行这个模拟将是不可能的,”他说。

The work requires a lot of heavy lifting, too, so BASF turned to an NVIDIA DGX Cloud service that uses NVIDIA H100 Tensor Core GPUs.
这项工作需要大量的重物搬运,因此巴斯夫转向使用 NVIDIA H100 Tensor Core GPU 的 NVIDIA DGX 云服务。

“We need a lot of computing power, and the NVIDIA platform is significantly faster than CPU-based hardware for this kind of simulation,” said Kuehn.
“我们需要大量的计算能力,而 NVIDIA 平台在这种模拟方面明显比基于 CPU 的硬件快得多,”Kuehn 说。

BASF’s quantum computing initiative, which Kuehn helped launch, started in 2017. In addition to its work in chemistry, the team is developing use cases for quantum computing in machine learning as well as optimizations for logistics and scheduling.
巴斯夫的量子计算计划始于 2017 年,由克恩帮助启动。除了在化学领域的工作外,团队还在机器学习中开发量子计算的用例,以及物流和调度方面的优化。

An Expanding CUDA-Q Community
一个不断扩大的 CUDA-Q 社区

Other research groups are also advancing science with CUDA-Q. 

At SUNY Stony Brook, researchers are pushing the boundaries of high-energy physics to simulate complex interactions of subatomic particles. Their work promises new discoveries in fundamental physics. 

“CUDA-Q enables us to do quantum simulations that would otherwise be impossible,” said Dmitri Kharzeev,  a SUNY professor and scientist at Brookhaven National Lab. 

In addition, a research team at Hewlett Packard Labs is using the Perlmutter supercomputer to explore magnetic phase transition in quantum chemistry in one of the largest simulations of its kind. The effort could reveal important and unknown details of physical processes too difficult to model with conventional techniques. 

“As quantum computers progress toward useful applications, high-performance classical simulations will be key for prototyping novel quantum algorithms,” said Kirk Bresniker, a chief architect at Hewlett Packard Labs. “Simulating and learning from quantum data are promising avenues toward tapping quantum computing’s potential.” 

A Quantum Center for Healthcare
医疗保健量子中心

These efforts come as support for CUDA-Q expands worldwide.
这些努力是为了全球范围内扩大对 CUDA-Q 的支持。

Classiq — an Israeli startup that already has more than 400 universities using its novel approach to writing quantum programs — announced today a new research center at the Tel Aviv Sourasky Medical Center, Israel’s largest teaching hospital.
Classiq — 一家以独特方法撰写量子程序而被 400 多所大学采用的以色列初创公司 — 今天宣布在以色列最大的教学医院特拉维夫索拉斯基医疗中心设立一个新的研究中心。

Created in collaboration with NVIDIA, it will train experts in life science to write quantum applications that could someday help doctors diagnose diseases or accelerate the discovery of new drugs. 

Classiq created quantum design software that automates low-level tasks, so developers don’t need to know all the complex details of how a quantum computer works. It’s now being integrated with CUDA-Q.
Classiq 创建了量子设计软件,自动化低级任务,使开发人员无需了解量子计算机的所有复杂细节。现在正在与 CUDA-Q 集成。

Terra Quantum, a quantum services company with headquarters in Germany and Switzerland, is developing hybrid quantum applications for life sciences, energy, chemistry and finance that will run on CUDA-Q. And IQM in Finland is enabling its superconducting QPU to use CUDA-Q. 

Quantum Loves Grace Hopper 

Several companies, including Oxford Quantum Circuits, will use NVIDIA Grace Hopper Superchips to power their hybrid quantum efforts. Based in Reading, England, Oxford Quantum is using Grace Hopper in a hybrid QPU/GPU system programmed by CUDA-Q.
包括牛津量子电路在内的几家公司将使用 NVIDIA Grace Hopper 超级芯片来支持他们的混合量子努力。总部位于英格兰雷丁的牛津量子正在使用 Grace Hopper 在由 CUDA-Q 编程的混合 QPU/GPU 系统中。

Quantum Machines announced that the Israeli National Quantum Center will be the first deployment of NVIDIA DGX Quantum, a system using Grace Hopper Superchips. Based in Tel Aviv, the center will tap DGX Quantum to power quantum computers from Quantware, ORCA Computing and more. 

In addition, Grace Hopper is being put to work by qBraid, in Chicago, to build a quantum cloud service, and Fermioniq, in Amsterdam, to develop tensor-network algorithms. 

The large quantity of shared memory and the memory bandwidth of Grace Hopper make these superchips an excellent fit for memory-hungry quantum simulations. 

Get started programming hybrid quantum systems today with the latest release of CUDA-Q from NGC, NVIDIA’s catalog of accelerated software, or GitHub. 

(Picture above source: BASF) 

NVIDIA Grace Hopper Superchip Powers 40+ AI Supercomputers Across Global Research Centers, System Makers, Cloud Providers
NVIDIA Grace Hopper 超级芯片驱动全球研究中心、系统制造商和云服务提供商的 40 多台 AI 超级计算机

GH200-powered centers represent 200 exaflops of AI performance driving scientific innovation.
GH200 动力中心代表着 200 艾克斯弗洛普的人工智能性能,推动科学创新。
by Angie Lee
2023 年 11 月 13 日,作者:安吉·李

Dozens of new supercomputers for scientific computing will soon hop online, powered by NVIDIA’s breakthrough GH200 Grace Hopper Superchip for giant-scale AI and high performance computing.
数十台新的用于科学计算的超级计算机将很快上线,由 NVIDIA 的突破性 GH200 Grace Hopper 超级芯片提供动力,用于巨型规模人工智能和高性能计算。

The NVIDIA GH200 enables scientists and researchers to tackle the world’s most challenging problems by accelerating complex AI and HPC applications running terabytes of data.
NVIDIA GH200 使科学家和研究人员能够加速运行数千兆字节数据的复杂人工智能和高性能计算应用程序,从而解决世界上最具挑战性的问题。

At the SC23 supercomputing show, NVIDIA today announced that the superchip is coming to more systems worldwide, including from Dell Technologies, Eviden, Hewlett Packard Enterprise (HPE), Lenovo, QCT and Supermicro. 

Bringing together the Arm-based NVIDIA Grace CPU and Hopper GPU architectures using NVIDIA NVLink-C2C interconnect technology, GH200 also serves as the engine behind scientific supercomputing centers across the globe.
通过使用 NVIDIA NVLink-C2C 互连技术将基于 Arm 的 NVIDIA Grace CPU 和 Hopper GPU 架构集成在一起,GH200 也作为全球科学超级计算中心背后的引擎。

Combined, these GH200-powered centers represent some 200 exaflops of AI performance to drive scientific innovation. 

HPE Cray Supercomputers Integrate NVIDIA Grace Hopper 

At the show in Denver, HPE announced it will offer HPE Cray EX2500 supercomputers with the NVIDIA Grace Hopper Superchip. The integrated solution will feature quad GH200 processors, scaling up to tens of thousands of Grace Hopper Superchip nodes to provide organizations with unmatched supercomputing agility and quicker AI training. This configuration will also be part of a supercomputing solution for generative AI that HPE introduced today. 

“Organizations are rapidly adopting generative AI to accelerate business transformations and technological breakthroughs,” said Justin Hotard, executive vice president and general manager of HPC, AI and Labs at HPE. “Working with NVIDIA, we’re excited to deliver a full supercomputing solution for generative AI, powered by technologies like Grace Hopper, which will make it easy for customers to accelerate large-scale AI model training and tuning at new levels of efficiency.” 

Next-Generation AI Supercomputing Centers 

A vast array of the world’s supercomputing centers are powered by NVIDIA Grace Hopper systems. Several top centers announced at SC23 that they’re now integrating GH200 systems for their supercomputers. 

Germany’s Jülich Supercomputing Centre will use GH200 superchips in JUPITER, set to become the first exascale supercomputer in Europe. The supercomputer will help tackle urgent scientific challenges, such as mitigating climate change, combating pandemics and bolstering sustainable energy production. 

Japan’s Joint Center for Advanced High Performance Computing — established between the Center for Computational Sciences at the University of Tsukuba and the Information Technology Center at the University of Tokyo — promotes advanced computational sciences integrated with data analytics, AI and machine learning across academia and industry. Its next-generation supercomputer will be powered by NVIDIA Grace Hopper. 

The Texas Advanced Computing Center, based in Austin, Texas, designs and operates some of the world’s most powerful computing resources. The center will power its Vista supercomputer with NVIDIA GH200 for low power and high-bandwidth memory to deliver more computation while enabling bigger models to run with greater efficiency.
总部位于得克萨斯州奥斯汀市的得克萨斯先进计算中心设计并运营全球一些最强大的计算资源。该中心将使用 NVIDIA GH200 为其 Vista 超级计算机提供低功耗和高带宽内存,以提供更多计算能力,同时使更大的模型能够以更高效率运行。

The National Center for Supercomputing Applications at the University of Illinois Urbana-Champaign will tap NVIDIA Grace Hopper superchips to power DeltaAI, an advanced computing and data resource set to triple NCSA’s AI-focused computing capacity. 

And, the University of Bristol recently received funding from the UK government to build Isambard-AI, set to be the country’s most powerful supercomputer, which will enable AI-driven breakthroughs in robotics, big data, climate research and drug discovery. The new system, being built by HPE, will be equipped with over 5,000 NVIDIA GH200 Grace Hopper Superchips, providing 21 exaflops of AI supercomputing power capable of making 21 quintillion AI calculations per second. 

These systems join previously announced next-generation Grace Hopper systems from the Swiss National Supercomputing Centre, Los Alamos National Laboratory and SoftBank Corp.
这些系统加入了先前宣布的来自瑞士国家超级计算中心、洛斯阿拉莫斯国家实验室和软银公司的下一代 Grace Hopper 系统。

GH200 Shipping Globally and Available in Early Access from CSPs 

GH200 is available in early access from select cloud service providers such as Lambda and Vultr. Oracle Cloud Infrastructure today announced plans to offer GH200 instances, while CoreWeave detailed plans for early availability of its GH200 instances starting in Q1 2024. 

Other system manufacturers such as ASRock Rack, ASUS, GIGABYTE and Ingrasys will begin shipping servers with the superchips by the end of the year. 

NVIDIA Grace Hopper has been adopted in early access for supercomputing initiatives by more than 100 enterprises, organizations and government agencies across the globe, including the NASA Ames Research Center for aeronautics research and global energy company TotalEnergies. 

In addition, the GH200 will soon become available through NVIDIA LaunchPad, which provides free access to enterprise NVIDIA hardware and software through an internet browser. 

Learn more about Grace Hopper and other supercomputing breakthroughs by joining NVIDIA at SC23 .

NVIDIA Brings New Production AI Capabilities to Microsoft Azure at Microsoft Ignite
NVIDIA 在 Microsoft Ignite 上为 Microsoft Azure 带来了新的生产 AI 功能

Discover how NVIDIA is enabling enterprises with instant access to AI supercomputing at Microsoft’s annual technology conference, from Nov.14-17, in Seattle.   
by Rohil Bhargava
2023 年 11 月 10 日,Rohil Bhargava 撰写

AI has become the cornerstone of innovation, and organizations now face the challenging task of harnessing its power to streamline operations, improve customer offerings and create new business opportunities.
人工智能已成为创新的基石,组织现在面临着利用其力量来简化运营、改善客户服务并创造新业务机会的艰巨任务。

NVIDIA and Microsoft Azure have come together to support these demands by bringing state-of-the-art AI infrastructure and software to companies tackling challenging workloads. NVIDIA will showcase its AI solutions portfolio with Microsoft Azure at Microsoft Ignite, running Nov. 14-17, in Seattle. 

For example, NVIDIA DGX Cloud, available on Microsoft Azure, allows enterprises to train models for generative AI and fuel other advanced applications. 

And to accelerate the digitalization of enterprise industrial operations, such as building virtual factories or validating autonomous vehicles, NVIDIA recently announced that Microsoft Azure will host NVIDIA Omniverse Cloud, a platform as a service that gives businesses instant access to a full-stack environment to design, deploy and manage their digitalized operations. 

Additionally, NVIDIA GPU-accelerated virtual machines on Azure, such as the recently announced ND H100 v5-series powered by NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking, enable an order-of-magnitude leap in performance and scalability to power the most challenging AI training and inference workloads. 

Learn more about how NVIDIA and Microsoft technologies are transforming industrial operations in the following Ignite sessions: 

  • Unlocking Generative AI in Enterprise With NVIDIA DGX Cloud on Microsoft  Azure: Carolyne Van Den Hoogen, product marketing specialist for the DGX platform at NVIDIA, and Robin Wood, director of global partner development at Microsoft, will talk about how NVIDIA DGX Cloud provides enterprises with the fastest path to advanced, production-ready models for generative AI applications, including large language models. Join on Thursday, Nov. 16, at 4 p.m. PT. 
  • Protecting Data and AI Models With Confidential Computing on Azure: Vikas Bhatia, head of product for Azure Confidential Computing at Microsoft, will discuss Microsoft Azure’s confidential computing offering and customer success stories, and Michael O’Connor, senior director of software architecture at NVIDIA, will share how confidential computing enables secure AI use cases that meet regulatory requirements. Join this virtual session on Thursday, Nov. 16, at 5:15 p.m. PT. 
  • Accelerate Building and Deploying AI Using NVIDIA AI Enterprise in Azure Machine Learning: Listen to this prerecorded session to hear from Abhishek Sawarkar, a product manager for GPU Cloud at NVIDIA, as he shares how the latest NVIDIA models, frameworks, containers and more available through Azure Machine Learning are streamlining the building and deployment of production AI at scale. 
  • Use NVIDIA AI Enterprise in Azure Machine Learning in Ease: NVIDIA experts will demonstrate how Azure Machine Learning with NVIDIA AI Enterprise integration empowers enterprises to build, deploy and scale production AI with ease. The session will also feature an example of an end-to-end flow, from LLM fine-tuning to model customization with NVIDIA NeMo to deployment with NVIDIA Triton Inference Server and NVIDIA TensorRT-LLM. Attend on Wednesday, Nov. 15, at 4:45 p.m. PT. 
  • Unlock AI Innovation With Azure AI Infrastructure: Ian Buck, vice president of high performance computing and hyperscale at NVIDIA, will participate in a breakout session with Nidhi Chappell, general manager of generative AI and high performance computing at Microsoft Azure, on Thursday, Nov. 16, from 5:15-6 p.m. PT.
    使用 Azure AI 基础设施解锁人工智能创新:NVIDIA 高性能计算和超大规模副总裁 Ian Buck 将于 11 月 16 日星期四下午 5:15 至 6:00 与微软 Azure 生成式人工智能和高性能计算总经理 Nidhi Chappell 一起参加分组会议。

Microsoft and NVIDIA will also host a roundtable on generative AI on Nov. 15, where they’ll share insights on how to choose the right platform for successful generative AI, make AI cost-effective and more. 

For more details, explore the NVIDIA showcase page for Microsoft Ignite.
要了解更多详情,请浏览微软 Ignite 的 NVIDIA 展示页面。

Scroll Back in Time: AI Deciphers Ancient Roman Riddles
时光倒流:人工智能解读古罗马谜题

Undergrad Luke Farritor used NVIDIA GPUs to help win the Vesuvius Challenge, deciphering long-lost texts from the Herculaneum scrolls as the race heats up for a $700,000 prize to unlock even more ancient texts.  
by Brian Caulfield
2023 年 11 月 10 日,Brian Caulfield 撰写

Thanks to a viral trend sweeping social media, we now know some men think about the Roman Empire every day.

And thanks to Luke Farritor, a 21-year-old computer science undergrad at the University of Nebraska-Lincoln, and like-minded AI enthusiasts, there might soon be a lot more to think about.

Blending a passion for history with machine learning skills, Farritor has triumphed in the Vesuvius Challenge, wielding the power of the NVIDIA GeForce GTX 1070 GPU to bring a snippet of ancient text back from the ashes after almost 2,000 years.

Text Big Thing: Deciphering Rome’s Hidden History

The Herculaneum scrolls are a library of ancient texts that were carbonized and preserved by the eruption of Mount Vesuvius in 79 AD, which buried the cities of Pompeii and Herculaneum under a thick layer of ash and pumice.

The competition, which has piqued the interest of historians and technologists across the globe, seeks to extract readable content from the carbonized remains of the scrolls.

In a significant breakthrough, the word “πορφυρας,” which means “purple dye” or “cloths of purple,” emerged from the ancient texts thanks to the efforts of Farritor.

The Herculaneum scrolls, wound about 100 times around, are sealed by the heat of the lava.
The Herculaneum scrolls, wound about 100 times around, are sealed by the heat of the eruption of Vesuvius.

His achievement in identifying 10 letters within a small patch of scroll earned him a $40,000 prize.

Close on his heels was Youssef Nader, a biorobotics graduate student, who independently discerned the same word a few months later, meriting a $10,000 prize.

Adding to these notable successes, Casey Handmer, an entrepreneur with a keen eye, secured another $10,000 for his demonstration that significant amounts of ink were waiting to be discovered within the unopened scrolls.

All these discoveries are advancing the work pioneered by W. Brent Seales, chair of the University of Kentucky Computer Science Department, who has dedicated over a decade to developing methods to digitally unfurl and read the delicate Herculaneum scrolls.

Turbocharging these efforts is Nat Friedman, the CEO of GitHub and the organizer of the Vesuvius Challenge, whose commitment to open-source innovation has fostered a community where such historical breakthroughs are possible.

To become the first to decipher text from the scrolls, Farritor, who served as an intern at SpaceX, harnessed the GeForce GTX 1070 to accelerate his work.

When Rome Meets RAM: Older GPU Helps Uncover Even Older Text

Introduced in 2016, the GTX 1070 is celebrated among gamers, who have long praised the GPU for its balance of performance and affordability.

Instead of gaming, however, Farritor harnessed the parallel processing capabilities of the GPU to accelerate the ResNet deep learning framework, processing data at speeds unattainable by traditional computing methods.

Farritor is not the only competitor harnessing NVIDIA GPUs, which have proven themselves as indispensable tools to Vesuvius challenge competitors.

Latin Lingo and Lost Text

Discovered in the 18th century in the Villa of the Papyri, the Herculaneum scrolls have presented a challenge to researchers. Their fragile state has made them nearly impossible to read without causing damage. The advent of advanced imaging and AI technology changed that.

The project has become a passion for Farritor, who finds himself struggling to recall more of the Latin he studied in high school. “And man, like what’s in the scrolls … it’s just the anticipation, you know?” Farritor said.

The next challenge is to unearth passages from the Herculaneum scrolls that are 144 characters long, echoing the brevity of an original Twitter post.

Engaging over 1,500 experts in a collaborative effort, the endeavor is now more heated than ever.

Private donors have upped the ante, offering a $700,000 prize for those who can retrieve four distinct passages of at least 140 characters this year — a testament to the value placed on these ancient texts and the lengths required to reclaim them.

And Farritor’s eager to keep digging, reeling off the names of lost works of Roman and Greek history that he’d love to help uncover.

He reports he’s now thinking about Rome — and what his efforts might help discover — not just every day now, but “every hour.” “I think anything that sheds light on that time in human history is gonna be significant,” Farritor said.

Unlock the Future of Video Conferencing and Editing With NVIDIA Maxine

by Greg Jones

The latest release of NVIDIA Maxine brings new and updated features that enhance real-time communication and elevate high-impact video editing with AI.

The Maxine developer platform redefines video conferencing and editing by providing developers and businesses with GPU-accelerated AI services in the cloud, so they can enhance video and audio streams in real time. With the Maxine production release, now exclusively available on NVIDIA AI Enterprise, users can access advanced features across augmented reality, audio effects and video effects.

The New Face of Avatars

Digital avatars have been used for decades, often seen as stylized animated representations of a person or character.

Now, with NVIDIA Maxine’s Live Portrait feature, users can choose the perfect photo of themselves and animate it with a standard webcam. Live Portrait syncs a person’s head movement and facial expressions to the user’s chosen photo. Users can also choose 2D stylized character representations of themselves.

Live Portrait is now available as a production feature in Maxine and can use high-resolution model outputs of up to 1024×1024 resolution.

Finding Your Voice

A person’s voice can convey emotion and communicative nuances. NVIDIA Maxine’s new Voice Font feature, available in early access, enables users to generate a unique voice for themselves — almost like a digital avatar for voice.

The feature can convert audio samples into a digital voice with just 30 seconds of reference audio.

Voice Font can be helpful for people who have speech impediments, or who want to fine-tune the sound of their own voice. Voice Font is available for evaluation and testing in the early access release of Maxine.

Seeing Eye-to-Eye

NVIDIA Maxine’s Eye Contact feature uses AI and a webcam feed to direct the user’s gaze toward the camera in real time. Similarly, it can repose eyes in offline video to create more engaging and impactful videos.

Studies have shown that maintaining eye contact during conversations encourages personal connection, understanding and engagement. Maxine Eye Contact enhances communication by ensuring that the user is always looking at their audience, whether in a video conference or via prerecorded video.

The new version of Maxine Eye Contact preserves naturally occurring micro-eye movement and has the added ability to “look away” periodically, creating a more realistic experience on video conferences. The frequency and duration of the “look away” function is also adjustable.

The latest version of Eye Contact also brings quality improvements, including increased gaze stability, more robust occlusion handling and much lower latency when used with NVIDIA Ada Generation GPUs.  

The Best of the Rest

The newest Maxine release also delivers new and updated real-time features across augmented reality, audio effects and video effects to advance video conferencing and editing capabilities. These include:

  • 3D Body Pose, for pose estimation when only the upper body is in camera view, introducing a new dimension to virtual interactions
  • Support for NVIDIA L4 Tensor Core and L40 GPUs, powered by the NVIDIA Ada Lovelace architecture, bringing up to 1.75x performance increases compared to GPUs from previous families
  • NVIDIA Triton Inference Server support for AI Green Screen, Eye Contact, Landmark Detection and Face Detection, enabling higher throughput for both single- and multiple-GPU workflows
  • Audio super resolution performance improvements
Eye Contact performance improvements using NVIDIA Triton (throughput gain based on concurrent streams).
AI Green Screen performance improvements using NVIDIA Triton (throughput gain based on concurrent streams).

Partnering for Success

Many NVIDIA partners and customers, such as Quicklink and CoPilot AI, are already experiencing high-quality video conferencing and editing with Maxine.

Quicklink is a leading global provider of remote production solutions to the media, broadcast, production and sport industries. The company’s Cre8 video production tool allows users to deliver professional virtual, in-person and hybrid events.

“Our partnership with NVIDIA has been driven by broadcast industry challenges, beginning with remote guest contribution,” said Richard Rees, CEO of Quicklink. “Our integration of NVIDIA Maxine into Quicklink Cre8 resolves these challenges with the addition of Maxine’s Auto Framing, Video Noise Removal, Noise and Echo Cancellation, and Eye Contact features. These features have had an incredible reception across the industry.”

Image courtesy of Quicklink

CoPilot AI, a Vancouver-based software-as-a-service startup operating in the cross-section between AI and sales enablement, incorporates Maxine to provide dependable solutions for content creators.

“Using NVIDIA Maxine microservices, we empower users to record on the whim and stand out in the crowd,” said Jackson Chao, cofounder of CoPilot AI Video. “Maxine’s Eye Contact feature allows users to record a script without compromising the connection with viewers. Even novice content creators are able to adopt video outreach with confidence and humanize the way they engage with their contacts.”

Maxine offers a collection of AI effects that enhance real-time audio and video and can be incorporated into existing customer infrastructures. And the solution can be deployed in the cloud, on premises or at the edge, enabling quality communication from nearly anywhere.

Availability

From enhancing day-to-day video conferencing needs to integrating AI technology, NVIDIA Maxine offers high-quality video communications for all professionals.

The latest Maxine production release is included exclusively with NVIDIA AI Enterprise 4.1, allowing users to tap into production-ready features such as Triton Inference Server, enterprise support and more.

For customers requiring access to NVIDIA Maxine’s limited early access program, fill in the relevant online application on the Maxine Microservices Early Access Program or Maxine SDK Early Access Program pages.

To help improve features in upcoming releases, participants can provide feedback by contributing to the NVIDIA Maxine and NVIDIA Broadcast App survey.