这是用户在 2024-4-12 3:14 为 https://www.atmosera.com/ai/understanding-chatgpt/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Understanding ChatGPT 理解 ChatGPT

Unless you’ve lived in a cave for the last few months, you’ve heard of ChatGPT. It’s a deep-learning model (neural network) created by OpenAI whose ability to generate human-like prose has made AI the topic of dinner-table conversations around the world. Here’s an example in which I asked it to explain molecular biology in the style of Dr. Seuss:
除非您在过去几个月里一直住在山洞里,否则您一定听说过 ChatGPT。这是由 OpenAI 创建的深度学习模型(神经网络),它生成类似人类散文的能力使得人工智能成为世界各地餐桌上的谈资。下面是一个例子,我让它以 Dr. Seuss 的风格解释分子生物学:

Molecular biology is a funny thing,
分子生物学是一件有趣的事情,

It looks at the cells with a dingle-ding,
它用 dingle-ding 看细胞,

It studies the parts that make up the whole,
它研究构成整体的部分,

Like proteins and DNA with a roll and a toll.
像蛋白质和 DNA 伴随着一次卷动与收费。

It looks at the way that genes are expressed,
它研究基因的表达方式,

How the proteins interact and how they’re blessed,
蛋白质相互作用及它们的祝福,

It looks at the structure and the way it’s made,
它看着结构和制作方式,

And how it’s all connected like a jigsaw parade.
以及它们如何像拼图一样连接在一起。

You can try ChatGPT yourself by pointing your browser to https://chat.openai.com/chat and typing a prompt such as “Write a short story that begins with it was a dark and stormy night.” You’ll have to create an account if you don’t already have one, but the account doesn’t have to be paid for. You get a certain number of free credits each month, and you can do a lot with those free credits.
您可以通过将浏览器指向 https://chat.openai.com/chat 并输入提示,例如“写一个以黑夜和暴风雨为开头的短篇故事”来尝试 ChatGPT。如果您还没有帐户,您将需要创建一个,但不必付费。您每个月会获得一定数量的免费积分,您可以用这些免费积分做很多事情。

ChatGPT isn’t limited to generating prose; it can produce code, too. Here’s how it responded when I prompted it to “Create a Python function that accepts an array of numbers as input, bubble sorts the numbers, and returns a sorted array:”
ChatGPT 不仅限于生成散文;它也可以生成代码。当我提示它“创建一个接受数字数组作为输入的 Python 函数,对数字进行冒泡排序,并返回一个排序后的数组”时,它是如何回应的。

1
2
3
4
5
6
7
8
9
10
11
def bubble_sort(arr):
    n = len(arr)
    # Traverse through all numbers in array
    for i in range(n):
        # Last i elements are already sorted
        for j in range(0, n - i - 1):
            # Traverse through the array from 0 to n-i-1
            # Swap if the element found is greater than the next element
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
    return arr

If Python isn’t your thing, ChatGPT understands dozens of other programming languages, too. You can even feed it code written in one language and ask it to convert it to another or ask it to comment on your raw source code files.
如果 Python 不是您的菜,ChatGPT 也理解数十种其他编程语言。您甚至可以向其提供用一种语言编写的代码,并要求将其转换为另一种语言,或者要求其对您的原始源代码文件进行评论。

So exactly what is ChatGPT? How was it built, and how does it work? The best way to explain ChatGPT is to start at the beginning. And the beginning, in this case, goes back to 2017.
那么 ChatGPT 究竟是什么?它是如何构建的,它是如何工作的?解释 ChatGPT 的最佳方式是从头开始。而在这种情况下,开始于 2017 年。

From LSTMs to Transformers
从 LSTM 到 Transformer

ChatGPT falls under the branch of deep learning known as natural language processing, or NLP. NLP encompasses a variety of activities, including text classification (sentiment analysis and spam filtering, for example), keyword extraction, named-entity recognition, document summarization, and question answering – posing a question in plain English and searching a corpus of text for an answer. One of the more ambitious tasks to which NLP is applied is neural machine translation, or NMT, which translates text into other languages.
ChatGPT属于被称为自然语言处理或NLP的深度学习分支。NLP涵盖了各种活动,包括文本分类(例如情感分析和垃圾邮件过滤)、关键词提取、命名实体识别、文档摘要和问答 - 用简单英语提出问题并在文本语料库中搜索答案。NLP应用于的更雄心勃勃的任务之一是神经机器翻译,即NMT,它将文本翻译成其他语言。

Until 2017, most NMT models, including the one that powered Google Translate, were recurrent neural networks. RNNs use Long Short-Term Memory (LSTM) cells to factor word order into their calculations. They understand, for example, that “Park the car down the street” and “The car park down the street” have two different meanings. Per the illustration below, which comes from a paper published by Google engineers in 2016, the Google Translate of that era used consecutive layers of LSTM cells to encode phrases to be translated, and another stack of LSTM cells to decode them into the target language. An attention module positioned between the encoder and decoder helped zero in on a sentence’s most important words.
直到 2017 年,大多数 NMT 模型,包括 Google Translate 的模型,都是循环神经网络。RNNs 使用长短期记忆(LSTM)单元来将单词顺序纳入计算中。例如,它们会理解“把车停到街对面”和“街对面有停车场”有两个不同的意思。如下图所示,这是谷歌工程师 2016 年发表的一篇论文中的插图,当时的 Google Translate 使用连续层的 LSTM 单元来编码要翻译的短语,并使用另一堆 LSTM 单元将它们解码成目标语言。位于编码器和解码器之间的注意力模块帮助集中到一个句子最重要的单词。

Google Translate circa 2016
Google 在 2016 年的翻译系统

In 2017, a landmark paper titled “Attention Is All You Need” changed the way data scientists approach NMT and other NLP tasks. That paper proposed a better way to process language based on transformer models that eschew LSTMs and use neural attention mechanisms to model the context in which words are used. Transformer models are superior to LSTM models in several ways, not the least of which is that they can infer meaning from text samples of any size (by contrast, as text length grows, an RNN’s power to connect related words diminishes) and they can be trained in parallel on multiple GPUs. Today, transformer models have almost entirely replaced LSTM-based models, particularly in the field of NLP. They also play an ever-increasing role in computer vision, particularly in models that generate images from textual descriptions.
在2017年,一篇名为“注意力就是一切”的标志性论文改变了数据科学家们处理神经机器翻译和其他自然语言处理任务的方式。该论文提出了一种更好的处理语言的方法,基于变压器模型,避开了LSTM,并使用神经注意机制来建模词语使用的上下文。与LSTM模型相比,变压器模型在多个方面都表现更优,主要是它们能够推断出任意大小的文本样本的含义(相比之下,随着文本长度的增长,RNN连接相关词语的能力会减弱),并且它们可以在多个GPU上并行训练。如今,变压器模型几乎完全取代了基于LSTM的模型,特别是在自然语言处理领域。它们还在计算机视觉领域发挥着越来越重要的作用,尤其是在从文本描述生成图像的模型中。

The diagram below comes from the aforementioned paper and documents a transformer encoder-decoder architecture that supports neural machine translation, among other tasks. (It also depicts the core architecture of ChatGPT.) The model has two inputs. On the left is a transformer that encodes text input to it into arrays of floating-point numbers that capture meaning. On the right is a transformer that takes the output from the encoder and a second text input and decodes them into a set of probabilities that predicts the next word in the second input. When translating English to French, the first text input is the English phrase to be translated. The second text input is the French translation generated so far. The model operates iteratively, first generating the first word in the translation, then the second, and so on. Much of the magic lies in the multi-head attention layers, which use the self-attention mechanism described in the paper to model relationships between words independent of the input length.
下面的图表来自上述论文,展示了一种支持神经机器翻译等任务的变压器编码器-解码器架构。(它还描绘了 ChatGPT 的核心架构。)该模型有两个输入。左侧是一个变压器,将输入的文本编码为捕捉含义的浮点数数组。右侧是一个变压器,它接收来自编码器和第二个文本输入的输出,并将它们解码为一组概率,预测第二个输入中的下一个单词。在将英语翻译成法语时,第一个文本输入是要翻译的英语短语。第二个文本输入是迄今为止生成的法语翻译。该模型通过迭代操作,首先生成翻译中的第一个单词,然后是第二个单词,依此类推。其中许多魔法都在多头注意力层中,它们使用论文中描述的自注意机制来建模单词之间的关系,独立于输入长度。

Transformer encoder-decoder architecture
变压器编码器-解码器架构

Chapter 13 of my book “Applied Machine Learning and AI for Engineers” features the source code for a working transformer-based NMT model. It does a credible job of translating short sentences written in English to French given that it was trained on just 50,000 phrase pairs: short English sentences and their French equivalents Here’s a sample of the training data. Note the special tokens denoting the start and end of the French phrases:
我的书《应用机器学习和人工智能》第 13 章介绍了一个基于变压器的工作 NMT 模型的源代码。考虑到它仅在 50000 个短语对上进行训练,它在将英语短句翻译成法语方面表现出色:短英语句子及其法语对应。以下是训练数据的示例。请注意,特殊标记表示法语短语的开头和结尾:

He heard the noise
他听到了声音
[start] Il entendit le bruit [end]
He heard the sound
他听到了声音
[start] Il a entendu le son [end]
He held his breath
他屏住了呼吸
[start] Il a retenu sa respiration [end]
He held his breath
他屏住了呼吸
[start] Il a retenu son souffle [end]
He is a bank clerk
他是一名银行职员
[start] Il est employé de banque [end]
[start] 他是一名银行职员 [end]
He is a bus driver
他是一名公交车司机
[start] Il est conducteur de bus [end]
[start] 他是一名公交车司机 [end]
I think you’re cute
我觉得你很可爱
[start] Je pense que tu es mignonne [end]
[start] 我觉得你很可爱 [end]
I think you’re cute
我觉得你很可爱
[start] Je pense que vous êtes adorable [end]
[start] 我觉得你很可爱 [end]
Are you a good golfer
你是一个优秀的高尔夫球手吗
[start] Êtes-vous bon golfeur [end]
[start] 你是一个优秀的高尔夫球手吗 [end]
Are you a good golfer
你是一个优秀的高尔夫球手吗
[start] Es-tu une bonne golfeuse [end]
[start] 你是一个优秀的高尔夫球手 [end]

Like the encoder-decoder in the diagram, my model has two inputs. One accepts an English phrase, the other a partially completed French phrase. The process of translating “hello world” into French is pictured below. You first input “hello world” to the English input and “[start]” to the French input. The model predicts the next word in the French translation by assigning a probability to every word in its French vocabulary – in this example, about 12,000 words – based on the corpus of text that it was trained with and picking the word with the highest probability. Then you call the model again, this time passing “hello world” to the English input and “[start] salut” to the French input. You repeat this process until the next predicted word is “[end]” denoting the end of the translation.
就像图中的编码器-解码器一样,我的模型有两个输入。一个接受英语短语,另一个接受部分完成的法语短语。将“hello world”翻译成法语的过程如下图所示。首先将“hello world”输入到英语输入中,将“[start]”输入到法语输入中。该模型通过为其法语词汇表中的每个单词分配概率(在本例中约为 12,000 个单词),基于其训练时使用的文本语料库,并选择具有最高概率的单词来预测法语翻译中的下一个单词。然后再次调用模型,这次将“hello world”传递给英语输入,“[start] salut”传递给法语输入。重复此过程,直到下一个预测的单词是“[end]”,表示翻译结束。

How a neural network translates English to French
神经网络如何将英语翻译成法语

The model can translate phrases it has never seen because it learned the mechanics of translating English to French from the 50,000 training samples. It’s not perfect, but the more data it’s trained with, the more capable it becomes. Google Translate was trained on more than 25 billion phrase pairs in over 100 languages, which explains why it rivals a human’s ability to translate text.
该模型可以翻译它从未见过的短语,因为它从 5 万个训练样本中学习了将英语翻译成法语的机制。它并不完美,但是训练数据越多,它的能力就越强。谷歌翻译在 100 多种语言中训练了超过 250 亿个短语对,这就解释了为什么它能与人类的翻译能力相媲美。

Learn more about our Azure services
了解更多关于我们的 Azure 服务

Managed Azure Services 托管的 Azure 服务

Azure Advisory Services Azure 咨询服务

Azure Consulting Services
Azure 咨询服务

Bidirectional Encoder Representations from Transformers (BERT)
双向编码器表示来自变压器(BERT)

Building models like Google Translate is expensive. First, you have to collect (or manufacture) billions of lines of training data. Then you need massive amounts of compute power to do the training. Models like these often require weeks to train on hundreds of graphics processing units (GPUs) or tensor processing units (TPUs) costing $10,000 or more each. But the introduction of transformers in 2017 laid the groundwork for another landmark innovation in the NLP space: Bidirectional Encoder Representations from Transformers, or BERT for short. Introduced by Google researchers in a 2018 paper titled “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, BERT advanced the state of the art by providing pre-trained transformers that can be fine-tuned for a variety of NLP tasks.
构建像谷歌翻译这样的模型是昂贵的。首先,您必须收集(或制造)数十亿行的训练数据。然后,您需要大量的计算能力来进行训练。这些模型通常需要数周的时间在数百个图形处理单元(GPU)或张量处理单元(TPU)上进行训练,每个单元的成本为10,000美元或更高。但是,2017年引入的transformers为自然语言处理领域的另一项里程碑创新奠定了基础:双向编码器表示来自transformers,简称BERT。由谷歌研究人员在2018年的一篇名为“BERT:深度双向transformers的预训练用于语言理解”的论文中介绍,BERT通过提供可用于各种NLP任务的预训练transformers,推动了技术的发展。

Google instilled BERT with language understanding by training it with more than 2.5 billion words from Wikipedia articles and 800 million words from Google Books. Training required four days on 64 TPUs. BERT isn’t generally useful by itself, but it can be fine-tuned to perform specific tasks such as sentiment analysis or question answering. Fine-tuning is accomplished by further training the pre-trained model with task-specific samples at a reduced learning rate, and it is much less expensive and time-consuming than training BERT from scratch. It’s as if you’re running a 26-mile marathon and BERT spots you the first 25 miles.
Google 通过对 BERT 进行训练,使用来自维基百科文章的 25 亿字和谷歌图书 800 亿字,使其具有语言理解能力。训练使用了 64 个 TPU 芯片,历时 4 天。BERT 本身并非通用,但可以进行微调以执行特定任务,如情感分析或问答。微调是通过使用特定任务样本对预训练模型进行进一步训练,学习率降低,并且比从头开始训练 BERT 要便宜、耗时更少。这就好比你跑了 26 英里马拉松比赛,而 BERT 已经帮你完成了前 25 英里。

BERT as a base model for fine-tuning
BERT 作为微调的基础模型

Aside from the fact that it was trained with a huge volume of text, the key to BERT’s ability to understand human language is an innovation known as Masked Language Modeling. MLM turns a large corpus of text into a training ground for learning the structure of a language. When BERT models are pretrained, a specified percentage of the words in each batch of text – usually 15% – are randomly removed or “masked” so the model can learn to predict the missing words from the words around them. Unidirectional models look at the text to the left or the text to the right and attempt to predict what the missing word should be. MLM uses text on the left and right to inform its decisions. That’s why BERT is a “bidirectional” transformer. A model has a better chance of predicting what word should fill in the blank in the phrase “Every good ____ does fine” than it has at predicting the next word in the phrase “Every good ____.” The answer could be boy, as in “Every good boy does fine,” or it could be turn, as in “Every good turn deserves another.” Or it could be something else entirely.
除了训练了大量的文本之外,BERT理解人类语言的关键在于一种称为Masked Language Modeling的创新。MLM将大量语料库转化为学习语言结构的训练场地。在BERT的预训练中,每一批文本中指定百分比的词(通常为15%)会被随机删除或“掩盖”,这样模型就能学习从周围词语中预测缺失的词。单向模型查看左边或右边的文本,并尝试预测缺失的单词是什么。MLM使用左右的文本来决定。这就是为什么BERT是“双向”变换器。模型更有可能预测在短语“Every good ____ does fine”中应填入的词,而不是预测“Every good ____”中的下一个词。答案可能是boy,如在“Every good boy does fine”中,也可能是turn,如在“Every good turn deserves another”中,或者完全是其他内容。

Next-word prediction is the super power that drives text generators such as ChatGPT. Thanks to MLM, deep-learning models can learn to predict the next word in a sentence without labeled datasets. That’s important, because creating labeled data is time-consuming – especially when it involves billions of samples.
下一个词预测是推动文本生成器(如 ChatGPT)的超级能力。多亏了 MLM,深度学习模型可以学会在句子中预测下一个词,而无需标记的数据集。这一点非常重要,因为创建标记数据是耗时的,特别是涉及数十亿个样本时。

BERT has been called the “Swiss Army knife” of NLP. Google uses it to improve search results and predict text as you type into a Gmail or Google Doc. Dozens of variations have been published, including DistilBERT, which retains 97% of the accuracy of the original model while weighing in 40% smaller and running 60% faster. Also available are variations of BERT already fine-tuned for specific tasks such as question answering. Such models can be further refined using domain-specific datasets, or they can be used as is.
BERT 被称为 NLP 的“瑞士军刀”。谷歌将其用于改进搜索结果,并在您输入 Gmail 或 Google Doc 时预测文本。已经发布了数十种变体,包括 DistilBERT,它在保持原始模型 97%准确率的同时,体积减小 40%,运行速度提高 60%。还有一些已经针对特定任务进行微调的 BERT 变体可供使用,比如问答。这样的模型可以使用特定领域的数据集进一步优化,或者直接使用。

If you’re a programmer and you’re curious to know what BERT fine-tuning looks like, my book offers an example. But 2018 was a long time ago. ChatGPT doesn’t rely on fine-tuned versions of BERT. The next section explains why.
如果您是程序员,并且想知道 BERT 微调是什么样子,我的书提供了一个例子。但是 2018 年已经过去很久了。ChatGPT 不依赖于 BERT 的微调版本。下一节将解释原因。

From BERT to ChatGPT
从 BERT 到 ChatGPT

BERT was the world’s first Large Language Model (LLM). It featured around 345 million parameters, which is a measure of the size and complexity of a neural network. (Think of an equation that has 345 million terms. That’s a big equation!)  OpenAI followed Google’s lead and produced BERT-like LLMs of their own in 2018 and 2019: first GPT-1 with 117 million parameters, and then GPT-2 with 1.5 billion parameters. In 2020, OpenAI rocked the NLP world by releasing GPT-3 featuring a whopping 175 billion parameters, earning it the title of the largest LLM, indeed the largest neural network, ever built. March 2023 saw the release of GPT-4, which builds on GPT-3. At the time of this writing, OpenAI hasn’t revealed GPT-4’s parameter count, but it is rumored to be in the neighborhood of 1 trillion.
BERT是世界上第一个大型语言模型(LLM)。它拥有大约3.45亿个参数,这是衡量神经网络大小和复杂度的指标。(想象一个有3.45亿项的方程。那是一个很大的方程!)OpenAI在2018年和2019年效仿谷歌,制作了类似BERT的模型:LLMs首先是具有1.17亿个参数的GPT-1,然后是具有15亿个参数的GPT-2。2020年,OpenAI发布了GPT-3,它拥有令人瞠目的1750亿个参数,因此被誉为迄今为止构建的最大的LLM,事实上也是最大的神经网络。2023年3月,GPT-4发布,它是基于GPT-3的。截至本文撰写时,OpenAI尚未透露GPT-4的参数数量,但有传言称其接近1万亿。

GPT stands for Generative Pretrained Transformer – generative because these models excel at generating text. Not only can they translate “hello world” to French, but they can also translate “Write a short story about three young people attending a wizarding school” into fully formed prose. GPT-3 was trained with roughly half a trillion words from Common CrawlWebText, Wikipedia, and a corpus of books. (OpenAI hasn’t revealed how long it took to train GPT-3, but they have said that they used a distributed training process with 1,728 NVIDIA V100 GPUs hosted in Microsoft Azure.) It was also trained to respond to natural-language prompts such as “Describe molecular biology in the style of Dr. Seuss” or “Translate hello world into French.” Unlike BERT, GPT models can perform certain NLP tasks such as text translation and question-answering without fine-tuning, a feat known as zero-shot or few-shot learning. OpenAI documented this in a seminal 2020 paper titled “Language Models are Few-Shot Learners.”
GPT代表生成预训练变压器——生成是因为这些模型擅长生成文本。它们不仅可以将“hello world”翻译成法语,还可以将“写一个关于三个年轻人参加魔法学校的短篇故事”翻译成完整的散文。GPT-3是用来自Common Crawl、WebText、维基百科和一系列书籍的大约五百亿字进行训练的。(OpenAI没有透露训练GPT-3花费了多长时间,但他们表示使用了托管在微软Azure中的1,728个NVIDIA V100 GPU的分布式训练过程。)它还被训练以回应自然语言提示,比如“以苏斯博士的风格描述分子生物学”或“将hello world翻译成法语”。与BERT不同,GPT模型可以执行某些自然语言处理任务,如文本翻译和问答,而无需微调,这一壮举被称为零射击或少射击学习。OpenAI在一篇开创性的2020年论文中记录了这一点,题为“语言模型是少射击学习者”。

ChatGPT is a fine-tuned version of GPT-3.5, which itself is a fined-tuned version of GPT-3. At its heart, ChatGPT is a transformer encoder-decoder that responds to prompts by iteratively predicting the first word in the response, then the second word, and so on – much like a transformer that translates English to French. The diagram below shows how ChatGPT might respond to the prompt “Complete the sentence every good.” Like a text translator, it generates each word in the response one word at a time, and it determines what the next word should be based on probabilities derived from the vast corpus of text it was trained on and the text it has generated so far. In this example, it picks the highest-ranking word every time, yielding the response “every good thing must come to an end.”
ChatGPT 是 GPT-3.5 的一个经过微调的版本,而 GPT-3 本身也是一个经过微调的版本。在其核心,ChatGPT 是一个变压器编码器-解码器,通过迭代地预测响应中的第一个单词,然后是第二个单词,依此类推来响应提示,就像一个将英语翻译成法语的变压器。下面的图表展示了 ChatGPT 如何响应提示“Complete the sentence every good.” 像文本翻译器一样,它逐个单词地生成响应中的每个单词,并根据从其训练文本和迄今为止生成的文本中得出的概率确定下一个单词应该是什么。在这个例子中,它每次选择排名最高的单词,得出响应“every good thing must come to an end.”

How ChatGPT completes a sentence with temperature=0.0
ChatGPT 如何在温度为 0.0 的情况下完成句子

Picking the top-ranking word in each iteration works fine for short responses, but it can make longer responses feel stiff and unimaginative. That’s why ChatGPT offers various knobs and dials that you can adjust to alter how it selects words. One of these is the temperature setting. If temperature=0, ChatGPT selects the word with the highest probability in each iteration. But if you raise temperature to, say, 0.7, ChatGPT sometimes picks words lower in the list, yielding text that often feels more natural and creative. This is illustrated below, where a non-zero temperature setting yields “every good thing has its price” from the same input prompt.
在每次迭代中选择排名靠前的词对于短回复效果很好,但可能会使较长的回复感觉僵硬和缺乏想象力。这就是为什么 ChatGPT 提供了各种旋钮和拨号,您可以调整以改变它选择单词的方式。其中之一是温度设置。如果温度=0,ChatGPT 在每次迭代中选择概率最高的词。但是,如果您将温度提高到,比如,0.7,ChatGPT 有时会选择列表中较低的词,产生的文本通常感觉更自然和有创意。下面是一个示例,其中非零温度设置从相同的输入提示中产生了“每件好事都有代价”。

How ChatGPT completes a sentence with temperature=0.7
如何使用温度=0.7 完成句子的 ChatGPT

ChatGPT is a glorified word predictor. It isn’t sentient. It doesn’t know what it’s saying, and yes, you can coax it into admitting that it wants to take over the world or saying hurtful things (although it was specially conditioned during training to try to suppress such output). It’s simply stringing words together using an expansive statistical model built from billions of sentences.
ChatGPT 是一个被吹捧的词语预测器。它没有意识。它不知道自己在说什么,是的,您可以诱使它承认它想要接管世界或说出伤人的话(尽管在训练期间它经过特殊调教以试图抑制这种输出)。它只是使用从数十亿句子构建的广泛统计模型串联单词。

Speaking of conditioning: ChatGPT was trained in three phases. First, the base model, GPT-3.5, was trained with more than 10,000 sample prompts and responses. Next, it was asked to generate several responses to each of tens of thousands of prompts, and a team of contractors graded each response for accuracy, appropriateness, and other criteria, assigning lower grades, for example, to responses that were inaccurate, unhelpful, or hurtful. Finally, the model was tweaked to favor higher-ranked responses. Out of this came ChatGPT as we know it today.
谈到条件:ChatGPT 分为三个阶段进行训练。首先,基础模型 GPT-3.5 接受了超过 10,000 个样本提示和回复的训练。接下来,模型要对数万个提示生成多个回复,一个承包商团队对每个回复进行准确性、适当性和其他标准的评分,例如对于不准确、无帮助或伤人的回复给予较低的评分。最后,模型被调整以偏向排名较高的回复。ChatGPT 就是在这个基础上诞生的。

One of the more remarkable aspects of ChatGPT is that when you ask it a question, it doesn’t crawl the Internet to find an answer. The entire body of knowledge present on the Internet in September 2021 (and then some) was baked into those 175 billion parameters during training. It’s akin to you answering a question off the top of your head rather than reaching for your phone and Googling for an answer. When Microsoft incorporated GPT-4 into Bing, they added a separate layer providing Internet access. OpenAI did the same with WebGPT, a GPT-3 variant. ChatGPT neither needs nor has that capability.
ChatGPT 的一项显著特点是,当你问它一个问题时,它不会在互联网上搜索答案。在训练过程中,互联网在 2021 年 9 月(甚至更早)的全部知识都被编码到了这 1750 亿个参数中。这就好比你脑中立即回答问题,而不是拿起手机谷歌搜索答案。微软将 GPT-4 整合到必应搜索引擎中时,添加了一个单独的层以提供互联网访问。OpenAI 也使用了 WebGPT,它是 GPT-3 的变种,但 ChatGPT 既不需要也没有这种能力。

ChatGPT was also trained on billions of lines of code from the world’s most popular source-code repository, GitHub, which is why it’s able to generate code as well as prose. This begs the question: Is it fair – and legal – to train a deep-learning model that “learns” from code written by others? Especially when much of that code is governed by open-source licenses that require attribution when used? That’s the billion-dollar question, and it’s the subject of a class-action lawsuit filed against Microsoft, GitHub, and OpenAI last year. It’s too early to tell how it will turn out, but it’s not too dramatic to say that the future of AI (and of writers, programmers, and artists as well) could hang in the balance. AI, after all, is nothing without vast amounts of data to train on.
ChatGPT 还接受了来自世界上最流行的源代码仓库 GitHub 的数十亿行代码的训练, 这就是为什么它能够生成代码以及文学作品。这就引发了这样一个问题:对于训练一个从他人编写的代码“学习”的深度学习模型是否公平——而且是否合法?特别是当大部分代码由要求在使用时进行归属的开源许可证所监管时?这是一个价值十亿美元的问题,它也是去年针对微软、GitHub 和 OpenAI 提起的一场集体诉讼的主题。现在还为时过早来判断它将如何发展,但不夸张地说,人工智能的未来(以及作家、程序员和艺术家们的未来)可能悬而未决。毕竟,没有大量的数据用于训练,AI 什么都不能做。

The ChatGPT API ChatGPT API

On March 1st, 2023, OpenAI made ChatGPT available through a REST API, making it possible for programmers to infuse ChatGPT’s intelligence into their own code. The following example, written in Python, uses the ChatGPT API to answer the question “In which academic paper was the deep learning transformer introduced, and where can I find it?”
2023 年 3 月 1 日,OpenAI 通过 REST API(即 ChatGPT API)使 ChatGPT 可用,这使得程序员可以将 ChatGPT 的智能嵌入到他们自己的代码中。下面这个例子是用 Python 编写的,使用 ChatGPT API 来回答这个问题:“深度学习变压器是在哪篇学术论文中介绍的,我在哪里可以找到它?”

1
2
3
4
5
6
7
8
9
10
11
12
messages = [{
    'role': 'user',
    'content': 'In which academic paper was the deep learning transformer ' \
               'introduced, and where can I find it?'
}]
 
response = openai.ChatCompletion.create(
    model='gpt-3.5-turbo',
    messages=messages
)
 
print(response.choices[0].message.content)

ChatGPT responded as follows:
ChatGPT 的回复如下:

The deep learning transformer was introduced in the academic paper titled “Attention Is All You Need” published in 2017 by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. The paper is available on the arXiv preprint server: https://arxiv.org/abs/1706.03762.
深度学习变压器是由 Ashish Vaswani、Noam Shazeer、Niki Parmar、Jakob Uszkoreit、Llion Jones、Aidan N. Gomez、Łukasz Kaiser 和 Illia Polosukhin 于 2017 年发表的学术论文“注意力就是一切”中引入的。该论文可在 arXiv 预印本服务器上找到:https://arxiv.org/abs/1706.03762。

I’ll introduce the ChatGPT API and some of the amazing things you can do with it in subsequent posts. For now, I’ll close with a description of ChatGPT written by ChatGPT:
我将在后续的帖子中介绍 ChatGPT API 以及您可以使用它做的一些令人惊奇的事情。现在,我将以 ChatGPT 自己的描述结束:

ChatGPT is a language model developed by OpenAI that uses the GPT-3.5 architecture to generate human-like responses to a wide range of topics. Trained on vast amounts of text from the internet, books, and other sources, ChatGPT can understand and generate natural language, engaging in conversations with people to provide informative and helpful responses. Its ability to understand context, infer meaning, and generate coherent and relevant responses makes it a powerful tool for a variety of applications, from language translation and customer service to personal assistants and creative writing. Continuously learning and improving through updates and refinements to its algorithms and training data, ChatGPT is an advanced technology that is changing the way we interact with computers and artificial intelligence.
ChatGPT 是由 OpenAI 开发的语言模型,使用 GPT-3.5 架构生成类似人类的响应,涵盖广泛的主题。ChatGPT 在互联网、书籍和其他来源的大量文本上进行训练,能够理解和生成自然语言,与人进行对话以提供信息丰富且有帮助的回应。其理解上下文、推断含义和生成连贯相关回应的能力使其成为各种应用的强大工具,从语言翻译和客户服务到个人助手和创意写作。通过更新和改进算法和训练数据,ChatGPT 不断学习和改进,是一种改变我们与计算机和人工智能互动方式的先进技术。

I couldn’t have said it better myself. In my next post, you will learn about the ChatGPT API.
我自己说不出来得更好。在我的下一篇文章中,您将了解有关 ChatGPT API 的内容。