ICS 33 2024 年春季，项目 4：仍在寻找某物

Background 背景

Our recent discussion of Functional Programming alluded to the fact that what makes programming languages different from one another isn't solely their syntax, though that's certainly part of it. Each programming language asks its users to think differently — sometimes dramatically so — about how best to organize our solution to a problem. What's considered normal (or even desirable) in an object-oriented language might be awkward (or even impossible) in a purely functional language, and vice versa. How we'd solve a problem in a data manipulation language like SQL would be radically different from how we'd solve the same problem in a language like Python. Naturally, some kinds of problems will be better solved with one set of tools than another, so we'd expect different programming languages to excel at different tasks; part of why we want some exposure to more than one programming language is so that we can start to develop our sensibilities about the ways that languages can differ, and how we might be able to recognize the kinds of problems that are a better fit for some than others. That way, even if we don't become experts in multiple languages at once, we'll at least have embraced the idea that no single language is the best solution to every problem; that'll open our minds to learning about alternatives when they show promise, rather than falling in love with our first language and never being able to let go of it, or simply riding the waves of hype and fashion wherever they lead, for better or worse.
我们最近讨论的函数式编程暗示了编程语言之间的不同之处不仅仅在于它们的语法，尽管这当然是其中的一部分。每种编程语言都要求其用户以不同的方式思考 - 有时甚至是戏剧性的 - 如何最好地组织我们解决问题的方法。在面向对象的语言中被认为是正常的（甚至是可取的）可能在纯函数式语言中是笨拙的（甚至是不可能的），反之亦然。我们在像 SQL 这样的数据操作语言中解决问题的方式将与我们在像 Python 这样的语言中解决同样的问题的方式截然不同。自然地，某些类型的问题将更适合用一组工具解决，因此我们期望不同的编程语言在不同的任务上表现出色；我们希望接触多种编程语言的部分原因是为了开始培养我们对语言之间差异的感知，以及我们如何能够识别哪些问题更适合某些语言而不是其他语言。这样，即使我们不会一下子成为多种语言的专家，至少我们会接受这样一个观念：没有一种语言是解决所有问题的最佳方案；这将打开我们的思维，让我们在其他语言显示潜力时学习新的选择，而不是迷恋我们的第一种语言，永远无法放手，或者简单地随波逐流，无论好坏。

Fortunately, we've already had a head start on that journey, because Project 2 asked you to build a single application that was written using more than one programming language. We used Python to implement our user interface and the "engine" underlying it, while we instead used SQL to describe our interactions with the database that stored and managed the program's data. The technique of writing systems made up of code written in multiple programming languages is sometimes called polyglot programming, which, like many choices we make in computing, represents a tradeoff: We give up the simplicity of writing everything in a single language, but we gain access to a set of abilities that approach the union of the abilities of all of the languages we're using. As long as we can figure out how to make code in one language work together with code in another — in Project 2, we relied on the sqlite3 library to smoothly communicate between them — and as long as we're careful to use the "best-fit" language for each part of our program, we can sometimes achieve things that are much more difficult to achieve when writing everything in one language. The more complex the system, the greater the chance it may benefit from polyglot techniques.
幸运的是，我们已经在这个旅程上有了一个良好的开端，因为项目 2 要求您构建一个单一应用程序，该应用程序使用了多于一种编程语言编写。我们使用 Python 来实现我们的用户界面和底层的“引擎”，而我们则使用 SQL 来描述我们与存储和管理程序数据的数据库的交互。编写由多种编程语言编写的代码组成的系统的技术有时被称为多语言编程，就像我们在计算中做出的许多选择一样，代表了一种权衡：我们放弃了用单一语言编写所有内容的简单性，但我们获得了一组能力，这些能力接近于我们正在使用的所有语言的能力的结合。只要我们能够弄清楚如何使一个语言中的代码与另一个语言中的代码协同工作——在项目 2 中，我们依赖 sqlite3 库来顺利地在它们之间通信——并且只要我们小心地为程序的每个部分使用“最合适”的语言，我们有时可以实现一些在用单一语言编写所有内容时更难实现的事情。系统越复杂，它可能从多语言技术中受益的机会就越大。

Among the differences between programming languages are the differences in their syntax, which is to say that different programming languages allow us to use different keywords and symbols in different orders. Where a SQL statement might begin with SELECT or CREATE TABLE, a Python statement might instead begin with class or def. There is some overlap between the words and phrases allowed across programming languages, but there are almost always differences somewhere. We can write a + b in both Python and SQL, for example, but the statements in which it can legally appear would need to be structured differently.
编程语言之间的差异之一在于它们的语法不同，也就是说不同的编程语言允许我们以不同的关键词和符号以不同的顺序使用。一个 SQL 语句可能以 SELECT 或 CREATE TABLE 开头，而 Python 语句可能以 class 或 def 开头。虽然在各种编程语言中允许使用的单词和短语有一些重叠，但通常总会有一些差异。例如，我们可以在 Python 和 SQL 中都写 a + b ，但它可以合法出现的语句结构可能会有所不同。

As you'll see in later coursework, the ability to describe the syntax of a programming language is a fairly universal need, so we would benefit from understanding a universal solution to it. A grammar is a well-known formalism that can do that job nicely. Grammars provide a formal way to describe syntax, allowing us to specify the valid orders in which words and symbols can appear. Grammars form the theoretical basis of parsers like the one provided in Project 3, whose main jobs are to decide whether a sequence of symbols is valid, by inferring the structure from which it derives its meaning. But we can use grammars in the opposite direction, too — generating sequences of symbols that we know are valid, rather than determining whether a sequence of symbols is valid — and that's our focus in this project.
正如您将在后续的课程作业中看到的那样，描述编程语言的语法的能力是一个相当普遍的需求，因此我们将受益于理解一个通用的解决方案。语法是一个众所周知的形式化方法，可以很好地完成这项工作。语法提供了一种形式化的描述语法的方式，使我们能够指定单词和符号可以出现的有效顺序。语法构成了像项目 3 中提供的解析器的理论基础，其主要工作是通过推断结构来决定一系列符号是否有效。但我们也可以沿着相反的方向使用语法，即生成我们知道是有效的符号序列，而不是确定一系列符号是否有效，这是我们在这个项目中的重点。

To satisfy this project's requirements, you'll write a program that randomly generates text in accordance with a grammar that's given as the program's input. (Note that parsing and generating text are hardly the only tasks for which we can use grammars; they're recurrent in the study of computer science, so you're likely to see them again in your studies, probably more than once.) You will also gain practice implementing a mutually recursive algorithm in Python, which will strengthen your understanding of our recent conversation in which we were Revisiting Recursion.
为满足这个项目的要求，您将编写一个程序，根据作为程序输入的语法随机生成文本。(请注意，解析和生成文本几乎不是我们可以使用语法的唯一任务；它们在计算机科学研究中是经常出现的，所以您很可能会在学习中再次看到它们，可能不止一次。) 您还将练习在 Python 中实现相互递归算法，这将加强您对我们最近讨论的“重新审视递归”的理解。

Grammars 语法

A grammar is a collection of substitution rules, each of which specifies how a symbol can be replaced with a sequence of other symbols. Collectively, the substitution rules that comprise a grammar describe a set of sentences that we say make up a language.
语法是一组替换规则的集合，其中每个规则指定符号如何被其他符号序列替换。总体而言，构成语法的替换规则描述了一组句子，我们称之为一种语言。

As a first example, consider the following grammar.
作为第一个例子，考虑以下语法。

A → 0 A 1 A | B
A → 0 A 1 A | B A → 0 A 1 A | B
B → #

There are two rules that make up our grammar: One specifying how the symbol A can be replaced, and another specifying a different replacement for B. We say that symbols that can be replaced in this way are variables, which I've denoted here with boldfaced, underlined text. Meanwhile, we say that symbols that cannot be replaced are terminals, and that the sentences that are part of a language described by a grammar are made up only of terminals. There are two variables in our grammar (A and B) and three terminals (0, 1, and #).
我们的语法由两条规则组成：一条规定了符号 A 如何被替换，另一条规定了符号 B 的不同替换。我们称这种可以以这种方式替换的符号为变量，我在这里用粗体、下划线标记了。与此同时，我们称那些不能被替换的符号为终结符，并且由语法描述的语言中的句子仅由终结符组成。我们的语法中有两个变量（A 和 B）和三个终结符（0、1 和#）。

The vertical bar ('|') symbol in the rule for A indicates optionality, which is to say that we can replace an occurrence of A with one of two options: either with the symbols 0 A 1 A or with the symbol B. Lacking a vertical bar, the rule for B offers only one option: We can only replace B with the terminal #.
在规则 A 中的竖线（'|'）符号表示可选性，也就是说我们可以用两个选项之一替换 A 的出现：要么用符号 0 A 1 A，要么用符号 B。在规则 B 中没有竖线，只提供一个选项：我们只能用终结符#替换 B。

We consider one of the variables to be the start variable, which is meant to describe an entire sentence. Other variables describe fragments of sentences. For the purposes of this example, we'll say that A is the start variable.
我们认为其中一个变量是起始变量，它用来描述整个句子。其他变量描述句子的片段。在这个例子中，我们将说 A 是起始变量。

Generating a sentence from a grammar
从语法生成一个句子

From a conceptual point of view, a grammar can be used to generate strings of terminals within its language in the following manner. (I should point out that this will not be precisely how your program will generate its output, but we'll start here, since it's a good way to understand the concepts underlying what we're doing.)
从概念角度来看，语法可以用以下方式生成其语言中的终端字符串。（我应该指出，这并不是您的程序生成输出的精确方式，但我们将从这里开始，因为这是理解我们正在做的概念的好方法。）

Begin with a sentence containing only one symbol: the start variable.
从只包含一个符号的句子开始：开始变量。
As long as there are still variables in the sentence, pick one of them, find the corresponding rule with that variable on its left-hand side, and choose one of its options. Replace the variable with the symbols in the option you chose.
只要句子中仍然存在变量，就选择其中一个变量，找到与该变量左侧对应的规则，并选择其中一个选项。用所选选项中的符号替换变量。

A sequence of substitutions leading from the start variable to a string of terminals is called a derivation. When the leftmost variable is always replaced at each step, the derivation is called a leftmost derivation. The sentence 0 0 # 1 # 1 # is in the language described by the grammar above, a fact we can prove using the following leftmost derivation.
从起始变量到终结符串的一系列替换称为推导。当在每一步总是替换最左边的变量时，这种推导称为最左推导。句子 0 0 # 1 # 1 # 属于上述文法描述的语言，这一事实我们可以通过以下最左推导来证明。

A ⇒ 0 A 1 A ⇒ 0 0 A 1 A 1 A ⇒ 0 0 B 1 A 1 A ⇒ 0 0 # 1 A 1 A ⇒ 0 0 # 1 B 1 A ⇒ 0 0 # 1 # 1 A ⇒ 0 0 # 1 # 1 B ⇒ 0 0 # 1 # 1 #

The algorithm described above would be able to produce this same sentence by making the same choices for each application of a rule that was made in this derivation.
上述描述的算法将能够通过在推导中做出相同选择的每个规则应用来生成相同的句子。

We would say, generally, that the language of a grammar is the set of all strings of terminals for which such a derivation can be built. It's worth noting that there are two aspects of this problem where infiniteness comes into play.
我们通常会说，语法的语言是可以构建这样的推导的所有终端字符串的集合。值得注意的是，在这个问题中有两个方面涉及到无限性。

The set of strings in a language may be infinite. For example, if a grammar contained the rule X → 1 X | 1, there would be no limit on how many times we could choose the 1 X option instead of the 1 option. Still, if we're generating strings at random, we'll always pick exactly one of these options, and we expect, sooner or later, to choose the 1 option, which would prevent the generated string from becoming any longer.
语言中的字符串集合可能是无限的。例如，如果一个语法包含规则 X → 1 X | 1，那么我们可以选择 1 X 选项而不是 1 选项的次数是没有限制的。然而，如果我们随机生成字符串，我们总是会精确地选择其中一个选项，并且我们期望，迟早会选择 1 选项，这将阻止生成的字符串变得更长。
A grammar can be written in a way that it describes individual strings of infinite length. If the only choice for the symbol Y is the rule Y → 1 Y, a derivation in which a string contains Y will never end; any substitution based on that rule will still lead to a string containing Y. (This is a similar problem we encounter when we have a recursive function with no base case.) In practice, though, a properly written grammar will eventually lead only to sentences of finite length.
一个语法可以被写成一种描述无限长度的个别字符串的方式。如果符号 Y 的唯一选择是规则 Y → 1 Y，那么包含 Y 的字符串的推导永远不会结束；基于该规则的任何替换仍将导致包含 Y 的字符串。（这是一个类似的问题，当我们有一个没有基本情况的递归函数时，我们会遇到这个问题。）然而，在实践中，一个正确编写的语法最终只会导致有限长度的句子。

The program 该程序

The basic goal of your program is to use the description of a grammar to randomly generate sentences that are in the grammar's language. There are a number of details to consider, which are described below.
您的程序的基本目标是使用语法的描述来随机生成符合语法语言的句子。有一些细节需要考虑，如下所述。

The format of a grammar file
语法文件的格式

The program will read a grammar file, which contains the description of a grammar to be used for generating random sentences. To include that feature in your program, though, we'll need to agree on a format for grammar files, which is specified in detail below.
该程序将读取一个语法文件，其中包含用于生成随机句子的语法描述。要在您的程序中包含该功能，我们需要就语法文件的格式达成一致意见，该格式在下面详细说明。

Each rule starts with a line containing only a left curly brace {. We'll say that each of these lines is called a rule opener.
每条规则都以一行只包含一个左花括号 { 开始。我们将称这些行中的每一行为规则开启器。
Each rule ends with a line containing only a right curly brace }. We'll say that each of these lines is called a rule closer.
每条规则都以只包含一个右花括号 } 的行结束。我们将称这些行中的每一行为规则结束符。
Any line of text that is not between a rule opener and a subsequent rule closer is considered to be a comment (i.e., it's irrelevant from our perspective, but can be a useful way to write a grammar file that would be more understandable to a human reader).
任何不在规则打开符和随后规则关闭符之间的文本行都被视为注释（即，从我们的角度来看是无关的，但可以是编写更易于人类读者理解的语法文件的一种有用方式）。
After a rule opener, the next line specifies the name of the variable for which a rule is being described. This line will consist of only letters and digits, but no whitespace (or other) characters.
在规则开头之后，下一行指定了正在描述规则的变量的名称。这一行将仅包含字母和数字，但不包括空格（或其他）字符。
Subsequent lines of the rule are the options for substituting a sequence of symbols in place of the rule's variable. There will always be at least one of these lines, and each of them will be as follows.
规则的后续行是替换规则变量位置上的符号序列的选项。这些行中至少会有一行，并且每一行将如下所示。
- It will begin with a positive integer (i.e., an integer greater than zero) that specifies the option's weight, which determines how frequently we'll choose it, relative to the others. That weight will be followed by a space.
  它将以一个正整数（即大于零的整数）开头，该整数指定选项的权重，确定我们选择它的频率相对于其他选项。该权重后面将跟着一个空格。
- After that will be zero or more symbols, each adjacent pair separated by a single space. When a symbol consists of letters and digits surrounded by brackets (i.e., [ and ]), it is a variable; otherwise, it is a terminal. (Note that the syntactic meaning of spaces means that symbols cannot contain spaces.)
  之后将是零个或多个符号，每个相邻对之间用一个空格分隔。当一个符号由括号括起的字母和数字组成（即 [ 和 ] ），它是一个变量；否则，它是一个终结符。（请注意，空格的句法含义意味着符号不能包含空格。）

As we'll see, a grammar file doesn't specify a start variable; that's specified subsequently as input to the program, so that the same grammar file can be used with different start variables in different runs.
正如我们将看到的那样，语法文件并不指定一个起始变量；这是随后作为程序输入指定的，以便在不同运行中可以使用相同的语法文件与不同的起始变量。

Having seen a description of the format, let's take a look at an example grammar file, so we can fully understand the details of what it means.
看完格式的描述后，让我们来看一个示例语法文件，这样我们就能充分理解它的细节。

{
HowIsBoo
1 Boo is [Adjective] today
}

{
Adjective
3 happy
3 perfect
1 relaxing
1 fulfilled
2 excited
}

Let's suppose that HowIsBoo is the start variable. If so, then the grammar describes sentences whose basic structure is always Boo is _____ today, with the _____ replaced with one of five adjectives:
假设 HowIsBoo 是起始变量。如果是这样，那么语法描述的句子基本结构始终是 Boo is _____ today ，其中 _____ 用五个形容词之一替换：

There's a 3-in-10 (30%) chance of the adjective being happy.
形容词是 happy 的概率是 3 成。
There's a 3-in-10 (30%) chance of the adjective being perfect.
形容词为 perfect 的概率是 3 成。
There's a 1-in-10 (10%) chance of the adjective being relaxing.
形容词为 relaxing 的概率是十分之一（10%）。
There's a 1-in-10 (10%) chance of the adjective being fulfilled.
形容词为 fulfilled 的概率是十分之一（10%）。
There's a 2-in-10 (20%) chance of the adjective being excited.
形容词是 excited 的概率是 2 成。

Where did those probabilities come from? The sum of the weights for all of the options for the Adjective variable is 10. (3 + 3 + 1 + 1 + 2 = 10.) Each individual weight is a numerator, and that sum is the denominator; happy has a weight of 3, so its odds are 3-in-10 (30%), and so on.
这些概率是从哪里来的？ Adjective 变量的所有选项的权重之和为 10。（3 + 3 + 1 + 1 + 2 = 10。）每个单独的权重是一个分子，而该总和是分母； happy 的权重为 3，因此其赔率为 10 中的 3（30%），依此类推。

One thing this example demonstrates is that weights have no meaning across rules, but only within a rule. For example, the sum of the weights in the rule for HowIsBoo is 1, while the sum for Adjective is 10, which means that "1 point" of weight means more in the HowisBoo rule than it does in the Adjective rule.
这个例子表明的一点是，权重在规则之间没有意义，只有在规则内部才有意义。例如， HowIsBoo 规则中权重的总和为 1，而 Adjective 规则中的总和为 10，这意味着在 HowisBoo 规则中，“1 点”权重的意义比在 Adjective 规则中更大。

A more complete example grammar file
一个更完整的示例语法文件

To provide you with a more complete example of a grammar file, check out the example linked below.
为了为您提供一个更完整的语法文件示例，请查看下面链接的示例。

grin.txt 笑容.txt

That's a grammar file that, when its start variable is GrinStatement, generates random statements written in the Grin language from Project 3. The generated statements will have no syntax errors in them, so it should be possible to run the lexer and parser on them; however, since the statements are generated individually and separately, it's unlikely that you'd be able to run them as a Grin program, because they may have run-time errors or other problems, such as infinite loops, division by zero, or jumping to non-existent labels. Generating semantically valid Grin programs (i.e., ones that you could successfully execute) is a problem that grammars are not equipped to solve, as it turns out.
这是一个语法文件，当其起始变量为 GrinStatement 时，会生成用项目 3 中的 Grin 语言编写的随机语句。生成的语句不会有语法错误，因此应该可以在其上运行词法分析器和语法分析器；然而，由于这些语句是单独生成的，所以你可能无法将它们作为一个 Grin 程序运行，因为它们可能会出现运行时错误或其他问题，比如无限循环、除以零或跳转到不存在的标签。生成语义上有效的 Grin 程序（即可以成功执行的程序）是一个语法无法解决的问题，事实证明如此。

The input 输入

The program will begin by reading exactly three lines from the Python shell (i.e., using Python's built-in input function).
该程序将通过从 Python shell 中精确读取三行开始（即，使用 Python 的内置 input 函数）。

The path to an existing grammar file. (If only the name of the file is specified, it will need to be located in the program's current working directory, which, by default, is the same directory as your main module.)
现有语法文件的路径。（如果只指定文件名，则需要将其定位在程序的当前工作目录中，默认情况下，该目录与您的主模块相同。）
A positive integer specifying the number of random sentences to be generated. (Note that, as always, zero is not a positive number.)
一个正整数，指定要生成的随机句子数量。（请注意，像往常一样，零不是正数。）
The name of the start variable. (A variable's name does not include the brackets; the brackets are a syntactic device within the grammar file to make clear when an option is referring to a variable.)
开始变量的名称。（变量的名称不包括括号；括号是语法文件中的一种句法设备，用于清楚地表示选项是指向一个变量。）

You can safely assume that the grammar file exists, that it will be valid (i.e., it will follow the grammar file format described above), and that the program input will be formatted according to the rules specified here; we won't be testing your program with inputs that don't meet those requirements, so your program can do anything (or even crash) if given such inputs.
您可以安全地假设语法文件存在，它将是有效的（即，它将遵循上述描述的语法文件格式），并且程序输入将根据此处指定的规则进行格式化；我们不会使用不符合这些要求的输入来测试您的程序，因此如果给出这样的输入，您的程序可以执行任何操作（甚至崩溃）。

We also will not be testing with a grammar file that describes infinite-length sentences, which means that your program can do anything (or even crash) if given such a grammar file.
我们也不会使用描述无限长度句子的语法文件进行测试，这意味着如果给定这样的语法文件，您的程序可以做任何事情（甚至崩溃）。

The output 输出

The output of your program is simple: If asked to generate n sentences, your program would print a total of n lines of output, each being one of those sentences, and each having a newline on the end of it. No more, no less.
您的程序的输出很简单：如果要求生成 n 个句子，您的程序将打印总共 n 行输出，每行都是其中一个句子，并且每行末尾都有一个换行符。不多，也不少。

Each sentence is a sequence of terminals, separated by spaces. That's it.
每个句子都是由空格分隔的终端序列。就是这样。

A complete example of the program's execution
程序执行的完整示例

Let's suppose that we had a grammar file named grammar.txt identical to the shorter example shown above. Given that, an example of the program's execution might look like this.
假设我们有一个名为 grammar.txt 的语法文件，与上面显示的较短示例相同。鉴于此，程序执行的示例可能如下所示。

    grammar.txt
    10
    HowIsBoo
    Boo is happy today
    Boo is fulfilled today
    Boo is relaxing today
    Boo is excited today
    Boo is perfect today
    Boo is happy today
    Boo is perfect today
    Boo is perfect today
    Boo is excited today
    Boo is happy today

Don't forget that the output is generated randomly, which means that a subsequent run of the same program with the same grammar file and the same input might reasonably be expected to produce different output. Remember, too, that the grammar file specifies its options as weights that are probabilities rather than being absolute. Consequently, a subsequent run that generates 10 sentences may, for example, have a different number of occurrences Boo is happy today; just because there's a 3-in-10 chance that happy is chosen in each sentence doesn't mean that exactly three out of every ten sentences will contain happy. (You can flip a coin ten times in a row and it can come up heads all ten times, even though there's a 1-in-2 chance of it happening each time. It's not likely, but it's not impossible, either.)
不要忘记，输出是随机生成的，这意味着使用相同的语法文件和相同的输入再次运行相同程序可能会产生不同的输出。也要记住，语法文件将其选项指定为概率权重，而不是绝对值。因此，例如，生成 10 个句子的后续运行可能会有不同数量的出现 Boo is happy today ；每个句子中选择 happy 的概率为 10 中的 3，并不意味着每十个句子中会包含 happy 。（你可以连续抛十次硬币，十次都是正面，尽管每次发生的概率是 2 中的 1。这不太可能，但也不是不可能。）

Design requirements 设计要求

There are a number of ways that this problem could be solved, but we'll focus on an approach that leads to a clean, mutually recursive algorithm for solving it, which you'll be required to implement.
有几种方法可以解决这个问题，但我们将专注于一种方法，这种方法会导致一个清晰的、相互递归的算法来解决它，您将需要实现这个算法。

Representing the grammar as objects
将语法表示为对象

From the description of the grammar file, we can see that it's built up from the following concepts.
从语法文件的描述中，我们可以看到它是由以下概念构建而成的。

A grammar contains a collection of rules.
一个语法包含一系列规则。
Each rule is made up of a variable and one or more options.
每个规则由一个变量和一个或多个选项组成。
Each option has a weight and a sequence of symbols, each of which is a terminal or a variable.
每个选项都有一个权重和一系列符号，其中每个符号都是终结符或变量。

These facts lead directly to an idea of how to design a combination of objects that can be used to represent a grammar.
这些事实直接导致了如何设计一种组合对象的想法，这些对象可以用来表示语法。

A class representing a terminal symbol.
表示终结符号的类。
A class representing a variable symbol.
代表变量符号的类。
A class representing an option.
代表一个选项的类。
A class representing a rule.
代表规则的类。
A class representing a grammar.
代表语法的类。

This may seem like a heavy-handed approach, but it pays off if we take it a step further. What if all of these classes implemented the same protocol, which allows us to ask any of their objects to do the same job: "Given this grammar, generate a sentence fragment from yourself"?
这种做法可能看起来有些强硬，但如果我们再进一步思考，它会带来回报。如果所有这些类都实现了相同的协议，这将使我们能够要求它们的任何对象执行相同的任务：“根据这个语法，从你自己生成一个句子片段”？

Generating random sentences from a grammar
从语法生成随机句子

Once you've represented your grammar as a combination of objects as described in the previous section, it is possible to implement a relatively straightforward mutually recursive algorithm to generate random sentences from it. The algorithm revolves around the idea of generating sentence fragments, then putting the fragments together into a complete sentence.
一旦您将语法表示为上一节中描述的对象组合，就可以实现一个相对简单的相互递归算法来从中生成随机句子。该算法围绕生成句子片段的概念展开，然后将这些片段组合成完整的句子。

Here is a sketch of such an algorithm.
这是这样一个算法的草图。

To generate a sentence from a grammar, it will look up the rule corresponding to the start variable, then ask that rule to generate a sentence fragment.
从语法生成句子时，它将查找与起始变量对应的规则，然后要求该规则生成一个句子片段。
To generate a sentence fragment from a rule, one of its options will be chosen at random (in accordance with their weights), which will then be asked to generate a sentence fragment.
从规则生成一个句子片段时，将随机选择其中一个选项（根据它们的权重），然后要求生成一个句子片段。
To generate a sentence fragment from an option, iterate through its symbols, generating sentence fragments from each one.
从选项中生成一个句子片段，需要遍历其符号，从每个符号生成句子片段。
To generate a sentence fragment from a variable symbol, ask the grammar for the rule corresponding to that variable, then ask that rule to generate a sentence fragment.
要从变量符号生成一个句子片段，可以向语法询问与该变量对应的规则，然后要求该规则生成一个句子片段。
To generate a sentence fragment from a terminal symbol, return the value of that terminal; that's its sentence fragment.
要从终结符生成一个句子片段，只需返回该终结符的值；这就是它的句子片段。

This mutually recursive strategy provides a great deal of power with relatively little code; by relying on Python's duck typing mechanism, we can allow the "right thing" to happen quickly and easily. (Note that why we say it's a "mutually recursive" strategy is because a grammar might use a rule, which uses one of its options, which uses one of its symbols that is a variable, which would, in turn, use another rule.)
这种相互递归的策略提供了大量的功能，而代码量相对较少；通过依赖 Python 的鸭子类型机制，我们可以快速而轻松地让“正确的事情”发生。（请注意，我们之所以说这是一种“相互递归”的策略，是因为一个语法可能使用一个规则，该规则使用其中的一个选项，该选项使用其中的一个变量符号，而这个变量符号反过来又会使用另一个规则。）

Furthermore, if we implement that algorithm using Python's generator functions — each of these methods yields a sequence of terminal symbols, rather than returning them — we can also do this job while using relatively little memory; our cost becomes a function of the depth of the grammar's rules (i.e., how deeply we recurse), rather than the length of the sentence we're generating, which is likely to be a significant improvement if we're building long sentences.
此外，如果我们使用 Python 的生成器函数来实现该算法——每种方法都会产生一系列终端符号，而不是返回它们——我们也可以在使用相对较少内存的情况下完成这项工作；我们的成本将成为语法规则深度的函数（即，我们递归的深度），而不是我们生成的句子的长度，如果我们正在构建长句子，这可能是一个显著的改进。

Your main module 您的主模块

You must have a Python module named project4.py, which provides a way to execute your program in whole; executing project4.py executes the program. Since you expect this module to be executed in this way, it would naturally need to have an if __name__ == '__main__': statement at the end of it, for reasons described in your prior coursework. Note that the provided Git repository will already contain this file (and the necessary if statement).
您必须有一个名为 project4.py 的 Python 模块，它提供了一种在整体上执行您的程序的方式；执行 project4.py 会执行该程序。由于您希望以这种方式执行该模块，因此它自然需要在末尾有一个 if __name__ == '__main__': 语句，原因在您之前的课程作业中有描述。请注意，提供的 Git 存储库已经包含了这个文件（和必要的 if 语句）。

Modules other than the main module
除主模块外的其他模块

Like previous projects, this is a project that is large enough that it will benefit from being divided into separate modules, each focusing on one kind of functionality, as opposed to jamming all of it into a single file or, worse yet, a single function. As before, wFe aren't requiring a particular organization, but we are expecting to see that you have "kept separate things separate."
与以往的项目一样，这是一个足够庞大的项目，将受益于将其分成单独的模块，每个模块专注于一种功能，而不是将所有内容塞进一个文件，更糟糕的是一个函数。与以前一样，我们不要求特定的组织，但我们期望看到您已经“将不同的事物分开”。

Unlike in Project 2 and Project 3, we are not requiring the use of Python packages, though you are certainly welcome to use them if you'd like.
与第 2 项目和第 3 项目不同，我们不要求使用 Python 软件包，但如果您愿意，当然可以使用它们。

Working and testing incrementally
逐步工作和测试

As you did in previous projects, you are required to do your work incrementally, to test it incrementally (i.e., as you write new functions, you'll be implementing unit tests for them), and to commit your work periodically into a Git repository, which you will be bundling and submitting to us.
与以往项目一样，您需要逐步完成工作，逐步测试（即，编写新功能时，您将为其实施单元测试），并定期将工作提交到 Git 存储库中，然后将其捆绑并提交给我们。

As in those previous projects, we don't have a specific requirement about how many commits you make, or how big a "feature" is, but your general goal is to commit when you've reached stable ground — a new feature is working, and you've tested it (including with unit test). We'll expect to see a history of these kinds of incremental commits.
在之前的项目中，我们并没有关于您提交的次数或“功能”大小的具体要求，但您的一般目标是在达到稳定状态时提交 —— 一个新功能正在运行，并且您已经进行了测试（包括单元测试）。我们期望看到这些逐步提交的历史记录。

Testing requirements 测试要求

Along with your program, you will be required to write unit tests, implemented using the unittest module in the Python standard library, and covering as much of your program as is practical. As before, write your unit tests in Python modules within a directory named tests.
随着您的程序，您将需要编写单元测试，使用 Python 标准库中的 unittest 模块实现，并尽可能覆盖您的程序。与以前一样，在名为 tests 的目录中编写您的 Python 模块单元测试。

As in previous projects, how you design aspects of your program has a positive impact on whether you can write unit tests for it, as well as how hard you might have to work to do it. Your goal is to cover as much of your program as is practical, though, as in recent projects, there is not a strict requirement around code coverage measurement, nor a specific number of tests that must be written, but we'll be evaluating whether your design accommodates your ability to test it, and whether you've written unit tests that substantially cover the portions that can be tested.
在以往的项目中，您如何设计程序的各个方面对您是否能够编写单元测试以及可能需要付出多少努力来完成它都有积极影响。您的目标是尽可能覆盖您的程序，尽管在最近的项目中，没有严格要求围绕代码覆盖率测量，也没有必须编写的特定测试数量，但我们将评估您的设计是否适合进行测试，以及您是否已编写了实质性覆盖可以进行测试的部分的单元测试。

Using test doubles to improve testability
使用测试替身来提高可测试性

One of the problems you face in this project is testing code that does things that are not directly amenable to the kinds of unit testing techniques you've learned thus far.
在这个项目中你面临的问题之一是测试代码，这些代码做的事情并不直接适用于你迄今为止学到的单元测试技术。

Printing output to the Python shell with the built-in print function. (This requires being able to ask "What got printed?" afterward, which isn't something that print supports.)
使用内置 print 函数将输出打印到 Python shell。(这需要能够随后询问“打印了什么？”，这是 print 不支持的。)
Reading input from the Python shell with the built-in input function. (This requires a user to supply input, since Python offers no built-in support for changing input to return input from somewhere else.)
使用 Python shell 中的内置 input 函数读取输入。(这需要用户提供输入，因为 Python 没有内置支持来更改 input 以从其他地方返回输入。)
Reading a grammar's description from a file. (This requires a file already to be there, making any such unit test fragile, since any change to that file would break the corresponding test.)
从文件中读取语法的描述。（这需要文件已经存在，使得任何这样的单元测试都很脆弱，因为对该文件的任何更改都会破坏相应的测试。）
Choosing numbers at random and using that to drive the output, so that the same input generates different outputs each time. (This one seems particularly insurmountable, since unit tests depend on having predictable output.)
随机选择数字并使用它来驱动输出，以便相同的输入每次生成不同的输出。（这似乎特别难以克服，因为单元测试依赖于具有可预测的输出。）

Earlier in the quarter, we learned a workaround for the first of these: By using the contextlib.redirect_stdout function from Python's standard library in concert with Python's with statement, we can temporarily redirect anything printed to the Python shell to an intermediary object, which provides the ability to ask "What got printed?" afterward. That kind of intermediary object is sometimes called a test double, whose job is to replace the built-in behavior of print — or, to be clearer, the built-in behavior of the program's standard output, which print depends on — with something different, so that the code under test can do its normal job, while being unaware that its output is being rerouted elsewhere.
在本季度早些时候，我们学到了解决第一个问题的方法：通过使用 Python 标准库中的 contextlib.redirect_stdout 函数，配合 Python 的 with 语句，我们可以临时将打印到 Python shell 的任何内容重定向到一个中间对象，这样就可以在之后询问“打印了什么？”。这种中间对象有时被称为测试替身，其作用是替换 print 的内置行为，或者更明确地说，替换程序的标准输出的内置行为， print 依赖于此，以便测试中的代码可以正常工作，同时不知道其输出被重定向到其他地方。

The other problems listed above don't have such a simple workaround available, yet nothing prevents us from implementing the same technique ourselves. If we use an intermediary object to do a job for us, we can substitute an object in place of that intermediary in a testing scenario. As long as both objects support the same protocol for being asked to do that job, the caller can remain blissfully unaware of the difference. Consequently, we'll write more than one class that supports the same protocol: one that does the "normal" thing and another whose objects can act as a test double instead.
上述列出的其他问题没有这样简单的解决方法，但这并不妨碍我们自己实现相同的技术。如果我们使用一个中介对象来为我们做一些工作，我们可以在测试场景中用一个对象替代那个中介。只要这两个对象都支持相同的协议来执行这项工作，调用者就可以对差异保持无知。因此，我们将编写支持相同协议的多个类：一个执行“正常”操作的类，另一个类的对象可以充当测试替身。

Rather than reading input by explicitly calling the input function, read it by asking an intermediary object for that same input. The usual implementation will call input, but a test double might respond to the same method calls by returning a hard-coded result instead.
与显式调用 input 函数读取输入不同，通过请求中介对象读取相同的输入。通常的实现会调用 input ，但测试替身可能会通过返回硬编码的结果来响应相同的方法调用。
Rather than reading a grammar from a file directly, intermediaries might present us the option of reading them from either a file or hard-coded input.
与直接从文件中读取语法不同，中介可能会提供从文件或硬编码输入中读取它们的选项。
Rather than generating random numbers by calling functions from the random module is Python's standard library, intermediaries could allow us to either generate random numbers normally or generate hard-coded results instead.
与在 Python 标准库中调用 random 模块的函数生成随机数不同，中介可以让我们选择正常生成随机数或者生成硬编码结果。

As an old joke in computer science circles says, "We can solve any problem by introducing an extra level of indirection," and that's essentially what's being suggested here. (Joking aside, sometimes it's useful advice!) Third-party libraries can add fancier support to smooth this out even further, but Python offers us enough flexibility for our purposes here.
正如计算机科学领域的一个古老笑话所说，“我们可以通过引入额外的间接层来解决任何问题”，这基本上就是这里所建议的。（开玩笑的话，有时这是有用的建议！）第三方库可以提供更高级的支持，以进一步简化这个过程，但 Python 为我们提供了足够的灵活性来满足我们在这里的目的。

How to implement a test double
如何实现测试替身

The first thing to realize about test doubles is that the phrase "test double" may be new for you, but the concept is not: When objects of two different types share the same protocol, they're both capable of being asked to do some job, even if they do that job differently. This has been a recurring theme in your coursework, since it's one of the pillars that Python's design rests on.
关于测试替身的第一件事是要意识到，“测试替身”这个术语对你来说可能是新的，但概念并非如此：当两种不同类型的对象共享相同的协议时，它们都能够被要求执行某项任务，即使它们执行该任务的方式不同。这一点在你的课程中一再出现，因为这是 Python 设计的支柱之一。

So, let's suppose that the job is running a database query against a SQLite database, like you did in Project 2. The "real" version might look something like this.
那么，假设工作是针对 SQLite 数据库运行数据库查询，就像您在项目 2 中所做的那样。"真实" 版本可能看起来像这样。

class SqlitePersonFinder:
    def __init__(self, connection):
        self._connection = connection

    def find_all_pepole(self):
        cursor = self._connection.execute(
            '''
            SELECT person_id, name, age
            FROM person;
            ''')

        try:
            while (row := cursor.fetchone()) is not None:
                yield Person(row[0], row[1], row[2])
        finally:
            cursor.close()

The problem with doing this in a unit test is that we can't predict what the output will be, unless we first set up a database with precisely the right set of people in it, so that we'll know what the answer is. But if we're really testing something else — a function that calls find_all_people, but what's really interesting about it isn't the database part, but what we do with the result — then it would be better to have an answer we can rely on.
在单元测试中这样做的问题在于，我们无法预测输出会是什么，除非我们首先建立一个数据库，其中包含精确的一组人，这样我们才会知道答案是什么。但如果我们真的在测试其他东西 —— 一个调用 find_all_people 的函数，但真正有趣的不是数据库部分，而是我们对结果的处理 —— 那么最好有一个我们可以依赖的答案。

So, why not give ourselves an object that can do the same job differently? The simplest idea would be to give it a list of people and have it yield those instead. "If I ask you what people are in the database, just give me these."
那么，为什么不给自己一个可以用不同方式完成同样工作的对象呢？最简单的想法是给它一个人员名单，让它返回这些人。"如果我问你数据库中有哪些人，请给我这些。"

class FakePersonFinder:
    def __init__(self, people):
        self._people = list(people)

    def find_all_people(self):
        return (p for p in self._people)

As long as a "person finder" object was, for example, a parameter to the function you're testing, you could test your function by sending it a FakePersonFinder, while your actual program would pass it a SqlitePersonFinder instead.
只要“人员查找器”对象是您要测试的函数的参数之一，您就可以通过向其发送 FakePersonFinder 来测试函数，而实际程序将发送 SqlitePersonFinder 。

That's all a test double is: an object that takes the place of something unpredictable (i.e., something that would cause tests to behave differently depending on how other things outside of them are set up) and replaces it with something predictable instead.
这就是测试替身的全部内容：一个代替不可预测对象的对象（即，会导致测试行为因其外部设置而有所不同的对象），并将其替换为可预测的对象。

Sanity-checking your output
检查您的输出

We are providing a tool that you can use to sanity check whether you've followed the basic requirements above. It will only give you a "passing" result in these circmustances.
我们提供了一个工具，您可以使用它来检查您是否遵循了上述的基本要求。在这些情况下，它只会给您一个“通过”的结果。

It's possible to run your program by executing a correctly named module (project4.py), spelled and capitalized correctly.
通过执行一个命名正确（ project4.py ）的模块，拼写和大小写正确，可以运行您的程序。
Executing that module is enough to execute your program.
执行该模块就足以执行您的程序。
Your program reads its input and generates character-for-character correct input for one simple test scenario. (Notably, since this program's output is random, the scenario tested uses no optionality in any of its rules.)
您的程序读取其输入并为一个简单的测试场景生成逐字正确的输入。(值得注意的是，由于该程序的输出是随机的，所测试的场景在其规则中不使用任何可选项。)

It should be noted that there are many additional tests you'll be want to perform, and that there are many additional tests that we'll be using when we grade your project. The way to understand the sanity checker's output is to think of it this way: Just because the sanity checker says your program passes doesn't mean it's close to perfect, but if you cannot get the sanity checker to report that your program passes, it surely will not pass all of our automated tests (and may well fail all of them).
需要注意的是，您将需要执行许多额外的测试，并且在我们评估您的项目时，我们将使用许多额外的测试。理解健全性检查器的输出方式是这样的：仅仅因为健全性检查器说您的程序通过了，并不意味着它接近完美，但如果您无法让健全性检查器报告您的程序通过，那么它肯定不会通过我们所有的自动化测试（很可能全部失败）。

You'll find the sanity checker in your project directory, in a file named project4_sanitycheck.py. Run that program like you would any other, and it will report a result.
您会在项目目录中找到名为 project4_sanitycheck.py 的文件中的健全性检查器。像运行其他程序一样运行该程序，它将报告一个结果。

Limitations 限制

You can use the Python standard library where appropriate in this project, but you will otherwise not be able to use code written by anyone else other than you. Notably, this includes third-party libraries (i.e., those that are not part of Python's standard library); colloquially, if we have to install something other than Python, Git, and PyCharm in order for your program to work, it's considered off-limits.
在这个项目中，您可以在适当的情况下使用 Python 标准库，但除此之外，您将无法使用其他人编写的代码。特别是，这包括第三方库（即不属于 Python 标准库的库）；俗称，如果我们必须安装除 Python、Git 和 PyCharm 之外的东西才能使您的程序正常工作，那就被视为禁区。

Preparing your submission
准备您的提交

When you're ready to submit your work, run the provided prepare_submission.py script, as you did in prior projects, which will create a Git bundle from the Git repository in your project directory; that Git bundle will be your submission.
当您准备提交您的工作时，请运行提供的 prepare_submission.py 脚本，就像在之前的项目中一样，该脚本将从您的项目目录中的 Git 存储库创建一个 Git 捆绑包；该 Git 捆绑包将是您的提交。

Verifying your bundle before submission
提交前验证您的捆绑包

If you're feeling unsure of whether your bundle is complete and correct, you can verify it by creating a new PyCharm project from it, as you did in Project 0. (You'll want to create this project in a different directory from your project directory, so it's separate and isolated.) Afterward, you should see the files in their final form, and the Git tab in PyCharm should show your entire commit history. If so, you're in business; go ahead and submit your work.
如果您对您的捆绑包是否完整和正确感到不确定，您可以通过从中创建一个新的 PyCharm 项目来验证它，就像在项目 0 中所做的那样。（您应该在与项目目录不同的目录中创建此项目，以便它是独立的和隔离的。）之后，您应该看到文件以最终形式显示，并且 PyCharm 中的 Git 选项卡应显示您的整个提交历史。如果是这样，您就可以继续提交您的工作。

Deliverables 可交付成果

Submit your project4.bundle file (and no others) to Canvas. There are a few rules to be aware of.
将您的 project4.bundle 文件（仅此文件）提交到 Canvas。有一些规则需要注意。

When grading your program, we'll grade only the most recent submission. We will not negotiate about which submission will be graded, or, for example, grade multiple of your submissions and "take the highest score."
在评分您的程序时，我们只会评分最近的提交。我们不会就要评分哪个提交进行协商，或者例如评分您的多个提交并“取最高分”。
When grading your program, we'll grade only the most recent commit on the main branch, except to the extent that we'll examine prior commits when evaluating your overall process. We will not negotiate about which commit will be graded, or, for example, grade multiple of your commits and "take the highest score."
在评分您的程序时，我们将仅对 main 分支上最近的提交进行评分，除非我们在评估您的整体流程时会检查之前的提交。我们不会就哪个提交将被评分进行协商，或者例如对您的多个提交进行评分并“取最高分”。
You're responsible for submitting the version of your project that you want graded prior to the deadline. Contacting us afterward and telling us that you accidentally submitted the wrong version will not be grounds for a resubmission under any circumstances.
您需要在截止日期之前提交您希望评分的项目版本。事后联系我们并告诉我们您意外提交了错误版本将不会成为重新提交的理由。
You're responsible for making a submission in order to receive credit, which means you'll want to be sure that you've remembered to submit your work and verified in Canvas that it's been received. A later claim of having forgotten to submit your work or having misremembered the due date will not be grounds for a resubmission under any circumstances.
您需要负责提交作业以获得学分，这意味着您需要确保已经记得提交您的作业，并在 Canvas 中验证已经收到。任何后来声称忘记提交作业或错误记错截止日期的情况都不会成为重新提交的理由。
The determination of whether your work has been submitted before the deadline is the time it was submitted to Canvas. Neither timestamps on local copies of your files nor timestamps on commits in your Git repository or in other places (e.g., emails or online storage) are considered evidence of completion prior to the deadline under any circumstances.
确定您的工作是否在截止日期之前提交是根据提交到 Canvas 的时间。无论是本地文件副本上的时间戳，还是 Git 存储库中的提交时间戳，或者其他地方（例如电子邮件或在线存储）上的时间戳，在任何情况下都不被视为在截止日期之前完成的证据。

Can I submit after the deadline?
我可以在截止日期之后提交吗？

Yes, it is possible, subject to the late work policy for this course, which is described in the section titled Late work at this link. Beyond the late work deadline described there, we will no longer accept submissions.
是的，这是可能的，但要遵守本课程的迟交作业政策，该政策在标题为迟交作业的部分中有详细描述。在那里描述的迟交作业截止日期之后，我们将不再接受提交。

What do I do if Canvas adjusts my filename?
如果 Canvas 调整了我的文件名，我该怎么办？

Canvas will sometimes modify your filenames when you submit them (e.g., by adding a numbering scheme like -1 or a long sequence of hexadecimal digits to its name). In general, this is fine; as long as the file you submitted has the correct name prior to submission, we'll be able to obtain it with that same name, even if Canvas adjusts it.
Canvas 有时在您提交文件时会修改文件名（例如，通过添加类似-1 的编号方案或一长串十六进制数字到文件名）。总的来说，这没问题；只要您提交的文件在提交之前具有正确的名称，我们就能够以相同的名称获取它，即使 Canvas 对其进行了调整。