这是用户在 2024-3-13 18:37 为 https://sp21.datastructur.es/materials/proj/proj2/proj2#overview-of-gitlet 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Project 2: Gitlet

A note on this spec

This spec is fairly long. The first half is a verbose and detailed description of every command you’ll support, and the other half is the testing details and some words of advice. To help you digest this, we’ve prepared many high quality videos describing portions of the spec and giving advice on how and where to begin. All videos are linked throughout this spec in the relevant location, but we’ll also list them right here for your convenience. Note: some of these videos were created in Spring 2020 when Gitlet was Project 3 and Capers was Lab 12, and some videos briefly mention Professor Hilfinger’s CS 61B setup (including a remote called shared, a repository called repo, etc). Please ignore these as they do not provide any useful information for you this semester. The actual content of the assignment is unchanged.
这个规范相当长。前半部分是对你将支持的每个命令进行冗长而详细的描述,而后半部分是测试细节和一些建议。为了帮助你理解,我们准备了许多高质量的视频,描述规范的部分内容并提供如何开始和何处开始的建议。所有视频都在规范的相关位置进行了链接,但我们也会在这里列出它们以方便你。注意:其中一些视频是在2020年春季创建的,当时Gitlet是项目3,Capers是实验室12,并且一些视频简要提到了Hilfinger教授的CS 61B设置(包括一个名为 shared 的远程仓库,一个名为 repo 的存储库等)。请忽略这些,因为它们对你本学期没有提供任何有用的信息。作业的实际内容没有改变。

As more resources are created, we’ll add them here, so refresh often!

Overview of Gitlet Gitlet概述

Warning: Ensure you’ve completed Lab 6: Canine Capers before this project. Lab 6 is intended to be an introduction to this project and will be very helpful in getting you started and ensure you’re all set up. You should also have watched Lecture 12: Gitlet, which introduces many useful ideas for this project.

In this project you’ll be implementing a version-control system that mimics some of the basic features of the popular system Git. Ours is smaller and simpler, however, so we have named it Gitlet.

A version-control system is essentially a backup system for related collections of files. The main functionality that Gitlet supports is:

  1. Saving the contents of entire directories of files. In Gitlet, this is called committing, and the saved contents themselves are called commits.

  2. Restoring a version of one or more files or entire commits. In Gitlet, this is called checking out those files or that commit.

  3. Viewing the history of your backups. In Gitlet, you view this history in something called the log.

  4. Maintaining related sequences of commits, called branches.

  5. Merging changes made in one branch into another.

The point of a version-control system is to help you when creating complicated (or even not-so-complicated) projects, or when collaborating with others on a project. You save versions of the project periodically. If at some later point in time you accidentally mess up your code, then you can restore your source to a previously committed version (without losing any of the changes you made since then). If your collaborators make changes embodied in a commit, you can incorporate (merge) these changes into your own version.

In Gitlet, you don’t just commit individual files at a time. Instead, you can commit a coherent set of files at the same time. We like to think of each commit as a snapshot of your entire project at one point in time. However, for simplicity, many of the examples in the remainder of this document involve changes to just one file at a time. Just keep in mind you could change multiple files in each commit.

In this project, it will be helpful for us to visualize the commits we make over time. Suppose we have a project consisting just of the file wug.txt, we add some text to it, and commit it. Then we modify the file and commit these changes. Then we modify the file again, and commit the changes again. Now we have saved three total versions of this file, each one later in time than the previous. We can visualize these commits like so:

Three commits

Here we’ve drawn an arrow indicating that each commit contains some kind of reference to the commit that came before it. We call the commit that came before it the parent commit–this will be important later. But for now, does this drawing look familiar? That’s right; it’s a linked list!

The big idea behind Gitlet is that we can visualize the history of the different versions of our files in a list like this. Then it’s easy for us to restore old versions of files. You can imagine making a command like: “Gitlet, please revert to the state of the files at commit #2”, and it would go to the second node in the linked list and restore the copies of files found there, while removing any files that are in the first node, but not the second.

If we tell Gitlet to revert to an old commit, the front of the linked list will no longer reflect the current state of your files, which might be a little misleading. In order to fix this problem, we introduce something called the head pointer (also called the HEAD pointer). The head pointer keeps track of where in the linked list we currently are. Normally, as we make commits, the head pointer will stay at the front of the linked list, indicating that the latest commit reflects the current state of the files:

Simple head

However, let’s say we revert to the state of the files at commit #2 (technically, this is the reset command, which you’ll see later in the spec). We move the head pointer back to show this:

Reverted head

Here we say that we are in a detatched head state which you may have encountered yourself before. This is what it means!

EDITED 3/5: Note that in Gitlet, there is no way to be in a detached head state since there is no checkout command that will move the HEAD pointer to a specific commit. The reset command will do that, though it also moves the branch pointer. Thus, in Gitlet, you will never be in a detached HEAD state.
请注意,在Gitlet中,由于没有将HEAD指针移动到特定提交的命令,所以不可能处于分离头状态。尽管 reset 命令可以实现这一点,但它也会移动分支指针。因此,在Gitlet中,你永远不会处于分离HEAD状态。

All right, now, if this were all Gitlet could do, it would be a pretty simple system. But Gitlet has one more trick up its sleeve: it doesn’t just maintain older and newer versions of files, it can maintain differing versions. Imagine you’re coding a project, and you have two ideas about how to proceed: let’s call one Plan A, and the other Plan B. Gitlet allows you to save both versions, and switch between them at will. Here’s what this might look like, in our pictures:

Two versions

It’s not really a linked list anymore. It’s more like a tree. We’ll call this thing the commit tree. Keeping with this metaphor, each of the separate versions is called a branch of the tree. You can develop each version separately:

Two developed versions

There are two pointers into the tree, representing the furthest point of each branch. At any given time, only one of these is the currently active pointer, and this is what’s called the head pointer. The head pointer is the pointer at the front of the current branch.

That’s it for our brief overview of the Gitlet system! Don’t worry if you don’t fully understand it yet; the section above was just to give you a high level picture of what its meant to do. A detailed spec of what you’re supposed to do for this project follows this section.

But a last word here: commit trees are immutable: once a commit node has been created, it can never be destroyed (or changed at all). We can only add new things to the commit tree, not modify existing things. This is an important feature of Gitlet! One of Gitlet’s goals is to allow us to save things so we don’t delete them accidentally.

Internal Structures 内部结构

Real Git distinguishes several different kinds of objects. For our purposes, the important ones are

Gitlet simplifies from Git still further by

Every object–every blob and every commit in our case–has a unique integer id that serves as a reference to the object. An interesting feature of Git is that these ids are universal: unlike a typical Java implementation, two objects with exactly the same content will have the same id on all systems (i.e. my computer, your computer, and anyone else’s computer will compute this same exact id). In the case of blobs, “same content” means the same file contents. In the case of commits, it means the same metadata, the same mapping of names to references, and the same parent reference. The objects in a repository are thus said to be content addressable.
每个对象 - 在我们的情况下,每个blob和每个提交 - 都有一个唯一的整数id,用作对该对象的引用。Git的一个有趣特性是这些id是通用的:与典型的Java实现不同,具有完全相同内容的两个对象在所有系统上都具有相同的id(即我的计算机、你的计算机和其他任何人的计算机都会计算出完全相同的id)。对于blob,"相同内容"意味着相同的文件内容。对于提交,它意味着相同的元数据,相同的名称到引用的映射,以及相同的父引用。因此,存储库中的对象被称为内容可寻址的。

Both Git and Gitlet accomplish this the same way: by using a cryptographic hash function called SHA-1 (Secure Hash 1), which produces a 160-bit integer hash from any sequence of bytes. Cryptographic hash functions have the property that it is extremely difficult to find two different byte streams with the same hash value (or indeed to find any byte stream given just its hash value), so that essentially, we may assume that the probability that any two objects with different contents have the same SHA-1 hash value is 2-160 or about 10-48. Basically, we simply ignore the possibility of a hashing collision, so that the system has, in principle, a fundamental bug that in practice never occurs!

Fortunately, there are library classes for computing SHA-1 values, so you won’t have to deal with the actual algorithm. All you have to do is to make sure that you correctly label all your objects. In particular, this involves

By the way, the SHA-1 hash value, rendered as a 40-character hexadecimal string, makes a convenient file name for storing your data in your .gitlet directory (more on that below). It also gives you a convenient way to compare two files (blobs) to see if they have the same contents: if their SHA-1s are the same, we simply assume the files are the same.

For remotes (like skeleton which we’ve been using all semester), we’ll simply use other Gitlet repositories. Pushing simply means copying all commits and blobs that the remote repository does not yet have to the remote repository, and resetting a branch reference. Pulling is the same, but in the other direction. Remotes are extra credit in this project and not required for full credit.
对于远程仓库(例如我们整个学期都在使用的 skeleton ),我们将简单地使用其他Gitlet仓库。推送意味着将远程仓库尚未拥有的所有提交和数据块复制到远程仓库,并重置分支引用。拉取则是相反的操作。在这个项目中,远程仓库是额外的加分项,不是必须完成的。

Reading and writing your internal objects from and to files is actually pretty easy, thanks to Java’s serialization facilities. The interface java.io.Serializable has no methods, but if a class implements it, then the Java runtime will automatically provide a way to convert to and from a stream of bytes, which you can then write to a file using the I/O class java.io.ObjectOutputStream and read back (and deserialize) with java.io.ObjectInputStream. The term “serialization” refers to the conversion from some arbitrary structure (array, tree, graph, etc.) to a serial sequence of bytes. You should have seen and gotten practice with serialization in lab 6. You’ll be using a very similar approach here, so do use your lab6 as a resource when it comes to persistence and serialization.
通过Java的序列化功能,从文件中读取和写入内部对象实际上非常简单。接口 java.io.Serializable 没有方法,但如果一个类实现了它,那么Java运行时将自动提供一种将对象转换为字节流的方法,然后可以使用I/O类 java.io.ObjectOutputStream 将其写入文件,并使用 java.io.ObjectInputStream 读取(反序列化)。术语“序列化”指的是将某个任意结构(数组、树、图等)转换为一系列字节的过程。你应该在实验6中见过并练习过序列化。在这里,你将使用非常类似的方法,所以在持久化和序列化方面可以使用实验6作为参考。

Here is a summary example of the structures discussed in this section. As you can see, each commit (rectangle) points to some blobs (circles), which contain file contents. The commits contain the file names and references to these blobs, as well as a parent link. These references, depicted as arrows, are represented in the .gitlet directory using their SHA-1 hash values (the small hexadecimal numerals above the commits and below the blobs). The newer commit contains an updated version of wug1.txt, but shares the same version of wug2.txt as the older commit. Your commit class will somehow store all of the information that this diagram shows: a careful selection of internal data structures will make the implementation easier or harder, so it behooves you to spend time planning and thinking about the best way to store everything.
这是本节讨论的结构的摘要示例。如您所见,每个提交(矩形)指向一些包含文件内容的blob(圆圈)。提交包含文件名和对这些blob的引用,以及一个父链接。这些引用被表示为箭头,并在 .gitlet 目录中使用它们的SHA-1哈希值(在提交上方和blob下方的小十六进制数字)表示。较新的提交包含 wug1.txt 的更新版本,但与较旧的提交共享 wug2.txt 的相同版本。您的提交类将以某种方式存储此图表显示的所有信息:仔细选择内部数据结构将使实现变得更容易或更困难,因此您应该花时间计划和思考存储所有内容的最佳方式。

Two commits and their blobs

Detailed Spec of Behavior

Overall Spec 总体规格

The only structure requirement we’re giving you is that you have a class named gitlet.Main and that it has a main method.
我们唯一给你的结构要求是你必须有一个名为 gitlet.Main 的类,并且该类必须有一个main方法。

We are also giving you some utility methods for performing a number of mostly file-system-related tasks, so that you can concentrate on the logic of the project rather than the peculiarities of dealing with the OS.

We have also added two suggested classes: Commit, and Repository to get you started. You may, of course, write additional Java classes to support your project or remove our suggested classes if you’d like. But don’t use any external code (aside from JUnit), and don’t use any programming language other than Java. You can use all of the Java Standard Library that you wish, plus utilities we provide.
我们还添加了两个建议的类: CommitRepository ,以帮助您入门。当然,您可以编写额外的Java类来支持您的项目,或者如果您愿意,删除我们的建议类。但是,请不要使用任何外部代码(除了JUnit),也不要使用除Java之外的任何编程语言。您可以使用所有您希望使用的Java标准库,以及我们提供的工具。

You should not do everything in the Main class. Your Main class should mostly be calling helper methods in the the Repository class. See the CapersRepository and Main classes from lab 6 for examples of the structure that we recommend.
你不应该在Main类中做所有的事情。你的Main类应该主要调用 Repository 类中的辅助方法。参考实验6中的 CapersRepositoryMain 类,了解我们推荐的结构示例。

The majority of this spec will describe how Gitlet.java’s main method must react when it receives various gitlet commands as command-line arguments. But before we break down command-by-command, here are some overall guidelines the whole project should satisfy:
大部分规范将描述当 Gitlet.java 的主方法接收到不同的gitlet命令作为命令行参数时,它应该如何反应。但在我们逐个命令进行分解之前,这里有一些整体指导方针,整个项目应该满足:

The Commands 命令

We now go through each command you must support in detail. Remember that good programmers always care about their data structures: as you read these commands, you should think first about how you should store your data to easily support these commands and second about if there is any opportunity to reuse commands that you’ve already implemented (hint: there is ample opportunity in this project to reuse code in later parts of project 2 that you’ve already written in earlier parts of project 2). We have listed lectures in some methods that we have found useful, but you are not required to use concepts from these lectures. There are conceptual quizzes on some of the more confusing commands that you should definately use to check your understanding. The quizzes are not for a grade, they are only there to help you check your understanding before trying to implement the command.

init 初始化


commit 承诺

Here’s a picture of before-and-after commit:

Before and after commit



commit a0da1ea5a15ab613bf9961fd86f010cf74c7ee48
Date: Thu Nov 9 20:00:05 2017 -0800
A commit message.

commit 3e8bf1d794ca2e9ef8a4007275acf3751c7170ff
Date: Thu Nov 9 17:01:33 2017 -0800
Another commit message.

commit e881c9575d180a215d1a636545b8fd9abfb1d2bb
Date: Wed Dec 31 16:00:00 1969 -0800
initial commit

There is a === before each commit and an empty line after it. As in real Git, each entry displays the unique SHA-1 id of the commit object. The timestamps displayed in the commits reflect the current timezone, not UTC; as a result, the timestamp for the initial commit does not read Thursday, January 1st, 1970, 00:00:00, but rather the equivalent Pacific Standard Time. Your timezone might be different depending on where you live, and that’s fine.
每个提交之前都有一个 === ,并且在其后有一个空行。与真实的Git一样,每个条目显示了提交对象的唯一SHA-1 id。提交中显示的时间戳反映了当前时区,而不是UTC;因此,初始提交的时间戳不是读作1970年1月1日星期四00:00:00,而是相应的太平洋标准时间。根据您所在的位置,您的时区可能会有所不同,这是正常的。

Display commits with the most recent at the top. By the way, you’ll find that the Java classes java.util.Date and java.util.Formatter are useful for getting and formatting times. Look into them instead of trying to construct it manually yourself!
显示最新的提交在顶部。顺便说一下,你会发现Java类 java.util.Datejava.util.Formatter 在获取和格式化时间方面非常有用。请查看它们,而不是试图手动构建!

Of course, the SHA1 identifiers are going to be different, so don’t worry about those. Our tests will ensure that you have something that “looks like” a SHA1 identifier (more on that in the testing section below).

For merge commits (those that have two parent commits), add a line just below the first, as in

commit 3e8bf1d794ca2e9ef8a4007275acf3751c7170ff
Merge: 4975af1 2c1ead1
Date: Sat Nov 11 12:30:00 2017 -0800
Merged development into master.

where the two hexadecimal numerals following “Merge:” consist of the first seven digits of the first and second parents’ commit ids, in that order. The first parent is the branch you were on when you did the merge; the second is that of the merged-in branch. This is as in regular Git.

Here’s a picture of the history of a particular commit. If the current branch’s head pointer happened to be pointing to that commit, log would print out information about the circled commits:


The history ignores other branches and the future. Now that we have the concept of history, let’s refine what we said earlier about the commit tree being immutable. It is immutable precisely in the sense that the history of a commit with a particular id may never change, ever. If you think of the commit tree as nothing more than a collection of histories, then what we’re really saying is that each history is immutable.

global-log 全球日志

find 找到

status 状态

checkout 结账

Checkout is a kind of general command that can do a few different things depending on what its arguments are. There are 3 possible use cases. In each section below, you’ll see 3 numbered points. Each corresponds to the respective usage of checkout.

A [commit id] is, as described earlier, a hexadecimal numeral. A convenient feature of real Git is that one can abbreviate commits with a unique prefix. For example, one can abbreviate
一个 [commit id] 是一个十六进制数字。真正的Git的一个方便的特性是可以用唯一的前缀缩写提交。例如,可以缩写为




in the (likely) event that no other object exists with a SHA-1 identifier that starts with the same six digits. You should arrange for the same thing to happen for commit ids that contain fewer than 40 characters. Unfortunately, using shortened ids might slow down the finding of objects if implemented naively (making the time to find a file linear in the number of objects), so we won’t worry about timing for commands that use shortened ids. We suggest, however, that you poke around in a .git directory (specifically, .git/objects) and see how it manages to speed up its search. You will perhaps recognize a familiar data structure implemented with the file system rather than pointers.
在(可能的)情况下,没有其他以相同六位数字开头的SHA-1标识符的对象存在。您应该安排相同的事情发生在包含少于40个字符的提交ID上。不幸的是,如果实现得不够聪明,使用缩短的ID可能会减慢查找对象的速度(使查找文件的时间与对象数量成线性关系),因此我们不会担心使用缩短的ID的命令的时间。然而,我们建议您在一个 .git 目录(具体来说, .git/objects )中探索一下,看看它是如何加速搜索的。您可能会认出一个熟悉的数据结构,它是通过文件系统而不是指针实现的。

Only version 3 (checkout of a full branch) modifies the staging area: otherwise files scheduled for addition or removal remain so.

branch 分支

All right, let’s see what branch does in detail. Suppose our state looks like this:

Simple history

Now we call java gitlet.Main branch cool-beans. Then we get this:
现在我们称之为 java gitlet.Main branch cool-beans 。然后我们得到这个:

Just called branch

Hmm… nothing much happened. Let’s switch to the branch with java gitlet.Main checkout cool-beans:
嗯...没什么大事发生。让我们切换到带有 java gitlet.Main checkout cool-beans 的分支:

Just switched branch

Nothing much happened again?! Okay, say we make a commit now. Modify some files, then java gitlet.Main add... then java gitlet.Main commit...
什么都没发生?!好吧,我们现在进行一次提交。修改一些文件,然后 java gitlet.Main add... 然后 java gitlet.Main commit...

Commit on branch

I was told there would be branching. But all I see is a straight line. What’s going on? Maybe I should go back to my other branch with java gitlet.Main checkout master:
我被告知会有分支。但我看到的只是一条直线。发生了什么?也许我应该回到我的另一个分支: java gitlet.Main checkout master

Checkout master

Now I make a commit…


Phew! So that’s the whole idea of branching. Did you catch what’s going on? All that creating a branch does is to give us a new pointer. At any given time, one of these pointers is considered the currently active pointer, also called the HEAD pointer (indicated by *). We can switch the currently active head pointer with checkout [branch name]. Whenever we commit, it means we add a child commit to the currently active HEAD commit even if there is already a child commit. This naturally creates branching behavior as a commit can now have multiple children.
呼!这就是分支的整个概念。你明白发生了什么吗?创建分支只是给我们一个新的指针。在任何给定的时间,其中一个指针被认为是当前活动的指针,也被称为HEAD指针(用*表示)。我们可以用 checkout [branch name] 切换当前活动的HEAD指针。每当我们提交时,意味着我们将一个子提交添加到当前活动的HEAD提交中,即使已经有一个子提交。这自然地创建了分支行为,因为一个提交现在可以有多个子提交。

A video example and overview of branching can be found here

Make sure that the behavior of your branch, checkout, and commit match what we’ve described above. This is pretty core functionality of Gitlet that many other commands will depend upon. If any of this core functionality is broken, very many of our autograder tests won’t work!
确保您的 branchcheckoutcommit 的行为与我们上面描述的相匹配。这是Gitlet的核心功能,许多其他命令将依赖于此。如果其中任何核心功能出现问题,我们的自动评分测试将无法正常工作!

rm-branch 删除分支

reset 重置

merge 合并

<<<<<<< HEAD
contents of file in current branch
contents of file in given branch

(replacing “contents of…” with the indicated file’s contents) and stage the result. Treat a deleted file in a branch as an empty file. Use straight concatenation here. In the case of a file with no newline at the end, you might well end up with something like this:

<<<<<<< HEAD
contents of file in current branch=======
contents of file in given branch>>>>>>>

This is fine; people who produce non-standard, pathological files because they don’t know the difference between a line terminator and a line separator deserve what they get.

Once files have been updated according to the above, and the split point was not the current branch or the given branch, merge automatically commits with the log message Merged [given branch name] into [current branch name]. Then, if the merge encountered a conflict, print the message Encountered a merge conflict. on the terminal (not the log). Merge commits differ from other commits: they record as parents both the head of the current branch (called the first parent) and the head of the branch given on the command line to be merged in.
一旦文件根据上述内容进行了更新,并且分割点不是当前分支或给定的分支,则自动合并提交并记录日志信息 Merged [given branch name] into [current branch name]. 。然后,如果合并遇到冲突,在终端上打印消息 Encountered a merge conflict. (而不是日志)。合并提交与其他提交不同:它们将当前分支的头(称为第一个父节点)和命令行中给定的要合并的分支的头作为父节点记录。

A video walkthrough of this command can be found here.

By the way, we hope you’ve noticed that the set of commits has progressed from a simple sequence to a tree and now, finally, to a full directed acyclic graph.

Skeleton 骨架

The skeleton is fairly bare bones with mostly empty classes. We’ve provided helpful javadoc comments hinting at what you might want to include in each file. You should follow a similar approach to Capers where your Main class doesn’t do a whole lot of work by itself, but rather simply calls other methods depending on the args. You’re absolutely welcome to delete the other classes or add your own, but the Main class should remain otherwise our tests won’t be able to find your code.
骨架相当简单,大部分类都是空的。我们提供了有用的javadoc注释,提示您在每个文件中可能需要包含的内容。您应该采用类似Capers的方法,其中您的 Main 类本身并不做太多工作,而是根据 args 调用其他方法。您完全可以删除其他类或添加自己的类,但 Main 类应该保留,否则我们的测试将无法找到您的代码。

If you’re confused on where to start, we suggest looking over Lab 6: Canine Capers.

Design Document 设计文档

Since you are not working from a substantial skeleton this time, we are asking that everybody submit a design document describing their implementation strategy. It is not graded, but you must have an up-to-date and completed design document before we help you in Office Hours or on a Gitbug. If you do not have one or it’s not up-to-date/not complete, we cannot help you. This is for both of our sakes: by having a design doc, you have written out a road map for how you will tackle the assignment. If you need help creating a design document, we can definately help with that :) Here are some guidelines, as well as an example from the Capers lab.
由于这次你没有一个实质性的框架,我们要求每个人提交一个描述实施策略的设计文档。这不会被评分,但在我们的办公时间或Gitbug上提供帮助之前,你必须有一个最新且完整的设计文档。如果你没有或者它不是最新/不完整的,我们无法帮助你。这对我们双方都有好处:通过拥有一个设计文档,你已经写出了一个解决任务的路线图。如果你需要帮助创建设计文档,我们肯定可以帮助你 :) 这里有一些指导方针,以及Capers实验室的一个示例。

Grader Details 评分员详细信息

We have three graders for Gitlet: the checkpoint grader, the full grader, and the snaps grader.

Checkpoint Grader 检查点评分员

Due 3/12 at 11:59 PM for 16 extra credit points.

Submit to the Project 2: Gitlet Checkpoint autograder on Gradescope.

It will test: 它将进行测试

In addition, it will comment on (but not score):

We will score these in your final submission. EDITED 3/4: It’s ok to have compiler warnings.

You’ll have a maximum capacity of 1 token which will refresh every 20 minutes. You will not get full logs on these failures (i.e. you will be told what test you failed but not any additional message), though since you have the tests themselves you can simply debug it locally.

Full Grader 全级划分

Due 4/2 at 11:59 PM for 1600 points.

The full grader is a more substantial and comprehensive test suite. You’ll have a maximum capacity of 1 token. Here is the schedule of token recharge rates:

You’ll see that, like Project 1, there is limited access to the grader. Please be kind to yourself and write tests along the way so you do not become too reliant on the autograder for checking your work.

Similar to the checkpoint, the full grader will have English hints on what each test does but not the actual .in file.
与检查点类似,完整的评分器将提供每个测试的英文提示,但不包括实际的 .in 文件。

Snaps Grader 快照分级器

Due 4/9 at 11:59 PM. Your Gradescope score will not be transferred to Beacon until you’ve pushed your snaps repo and submitted to the Snaps Gradescope assignment. To push your snaps repo, run these commands:
截止日期为4月9日晚上11:59。在您推送snaps存储库并提交到Snaps Gradescope作业之前,您的Gradescope分数将不会转移到Beacon。要推送您的snaps存储库,请运行以下命令:

git push

After you’ve pushed your snaps repository, there is a Gradescope assignment that you will submit your snaps-sp21-s*** repository to (similar to Project 1). This is only for the full grader (not the checkpoint nor the extra credit assignment).

You can do this up to a week after the deadline as well in case you forget. If you forget to push after a week, then you’ll have to use slip days.

Extra credit 额外学分

There are a total of 16 + 32 + 64 = 112 extra credit points possible:
总共有16 + 32 + 64 = 112个额外学分点可能

  1. 16 for the checkpoint
  2. 32 for the status command printing the Modifications Not Staged For Commit and Untracked Files sections
    32用于打印 status 命令的 Modifications Not Staged For CommitUntracked Files 部分
  3. 64 for the remote commands

The rest of this spec is filled resources for you that you should read to get you started. The section on testing/debugging will be extremely helpful to you as testing and debugging in this project will be different than previous projects, but not so complicated.

Miscellaneous Things to Know about the Project

Phew! That was a lot of commands to go over just now. But don’t worry, not all commands are of the same difficulty. You can see for each command the approximate number of lines we took to do each part (this only counts code specific to that command – it doesn’t double-count code reused in multiple commands). You shouldn’t worry about matching our solution exactly, but hopefully it gives you an idea about the relative time consumed by each command. Merge is a lengthier command than the others, so don’t leave it for the last minute!
哇!刚才要处理的命令太多了。但是不用担心,不是所有的命令都一样难。你可以看到每个命令我们花了大约多少行来完成每个部分(这只计算与该命令相关的代码 - 不会重复计算在多个命令中重复使用的代码)。你不必担心完全匹配我们的解决方案,但希望它能给你一个关于每个命令所需时间的相对概念。合并是比其他命令更耗时的命令,所以不要等到最后一分钟再处理!

This is an ambitious project, and it would not be surprising for you to feel lost as to where to begin. Therefore, feel free to collaborate with others a little more closely than usual, with the following caveats:

The Ed megathreads typically get very long for Gitlet, but they are full of very good conversation and discussion on the approach for particular commits. In this project more than any you should take advantage of the size of the class and see if you can find someone with a similar question to you on the megathread. It’s very unlikely that your question is so unique to you that nobody else has had it (unless it is a bug that relates to your design, in which case you should submit a Gitbug).

By now this spec has given you enough information to get working on the project. But to help you out some more, there are a couple of things you should be aware of:

Dealing with Files 处理文件

This project requires reading and writing of files. In order to do these operations, you might find the classes java.io.File and java.nio.file.Files helpful. Actually, you may find various things in the java.io and java.nio packages helpful. Be sure to read the gitlet.Utils package for other things we’ve written for you. If you do a little digging through all of these, you might find a couple of methods that will make the io portion of this project much easier! One warning: If you find yourself using readers, writers, scanners, or streams, you’re making things more complicated than need be.
这个项目需要读写文件。为了进行这些操作,你可能会发现 java.io.Filejava.nio.file.Files 类很有帮助。实际上,你可能会在 java.iojava.nio 包中找到各种有用的东西。一定要阅读 gitlet.Utils 包中我们为你编写的其他内容。如果你在所有这些内容中进行一些挖掘,你可能会找到一些方法,使得这个项目的io部分更容易!一个警告:如果你发现自己在使用读取器、写入器、扫描器或流,那么你正在使事情变得更加复杂。

Serialization Details 序列化细节

If you think about Gitlet, you’ll notice that you can only run one command every time you run the program. In order to successfully complete your version-control system, you’ll need to remember the commit tree across commands. This means you’ll have to design not just a set of classes to represent internal Gitlet structures during execution, but you’ll need an analogous representation as files within your .gitlet directories, which will carry across multiple runs of your program.

As indicated earlier, the convenient way to do this is to serialize the runtime objects that you will need to store permanently in files. The Java runtime does all the work of figuring out what fields need to be converted to bytes and how to do so.

You’ve already done serialization in lab6 and so we will not repeat the information here. If you are still confused on some aspect of serialization, re-read the relevant portion of the lab6 spec and also look over your code.

There is, however, one annoying subtlety to watch out for: Java serialization follows pointers. That is, not only is the object you pass into writeObject serialized and written, but any object it points to as well. If your internal representation of commits, for example, represents the parent commits as pointers to other commit objects, then writing the head of a branch will write all the commits (and blobs) in the entire subgraph of commits into one file, which is generally not what you want. To avoid this, don’t use Java pointers to refer to commits and blobs in your runtime objects, but instead use SHA-1 hash strings. Maintain a runtime map between these strings and the runtime objects they refer to. You create and fill in this map while Gitlet is running, but never read or write it to a file.

You might find it convenient to have (redundant) pointers commits as well as SHA-1 strings to avoid the bother and execution time required to look them up each time. You can store such pointers in your objects while still avoiding having them written out by declaring them “transient”, as in

    private transient MyCommitType parent1;

Such fields will not be serialized, and when back in and deserialized, will be set to their default values (null for reference types). You must be careful when reading the objects that contain transient fields back in to set the transient fields to appropriate values.

Unfortunately, looking at the serialized files your program has produced with a text editor (for debugging purposes) would be rather unrevealing; the contents are encoded in Java’s private serialization encoding. We have therefore provided a simple debugging utility program you might find useful: gitlet.DumpObj. See the Javadoc comment on gitlet/DumpObj.java for details.
很遗憾,使用文本编辑器查看您的程序生成的序列化文件(用于调试目的)将会非常无法揭示;内容是使用Java的私有序列化编码进行编码的。因此,我们提供了一个简单的调试工具程序,您可能会发现有用: gitlet.DumpObj 。有关详细信息,请参阅 gitlet/DumpObj.java 上的Javadoc注释。

Testing 测试

You should read through this entire section, though a video is also avilable for your convenience.

As usual, testing is part of the project. Be sure to provide your own integration tests for each of the commands, covering all the specified functionality. Also, feel free add any unit tests you’d like. We don’t provide any unit tests since unit tests are highly dependent on your implementation.

We have provided a testing program that makes it relatively easy to write integration tests: testing/tester.py. This interprets testing files with an .in extension. You may run all of the tests with the command
我们提供了一个测试程序,可以相对容易地编写集成测试: testing/tester.py 。这个程序会解释带有 .in 扩展名的测试文件。您可以使用以下命令运行所有的测试。

make check

If you’d like additional information on the failed tests, such as what your program is outputting, run:

make check TESTER_FLAGS="--verbose"

If you’d like to run a single test, within the testing subdirectory, run the command
如果您想在 testing 子目录中运行单个测试,请运行以下命令

python3 tester.py --verbose FILE.in ...

where FILE.in ... is a list of specific .in files you want to check.
其中 FILE.in ... 是您想要检查的特定 .in 文件的列表。

CAREFUL RUNNING THIS COMMAND as it does not recompile your code. Every time you run a python command, you must first compile your code (via make).
小心运行此命令,因为它不会重新编译您的代码。每次运行一个 python 命令之前,您必须先编译您的代码(通过 make )。

The command 该命令

python3 tester.py --verbose --keep FILE.in

will, in addition, keep around the directory that tester.py produces so that you can examine its files at the point the tester script detected an error. If your test did not error, then the directory will still remain there with the final contents of everything.
此外,还会保留 tester.py 生成的目录,以便在测试脚本检测到错误时可以查看其文件。如果您的测试没有出错,那么该目录仍将保留,并包含最终的所有内容。

In effect, the tester implements a very simple domain-specific language (DSL) that contains commands to

python3 testing/tester.py

(with no operands, as shown) will provide a message documenting this language. We’ve provided some examples in the directory testing/samples. Don’t put your own tests in that subdirectory; place them somewhere distinct so you don’t get confused with our tests vs your tests (which may be buggy!). Put all your .in files in another folder called student_tests within the testing directory. In the skeleton, this folder is blank.
(如所示,没有操作数)将提供一个记录此语言的消息。我们在目录 testing/samples 中提供了一些示例。不要将您自己的测试放在该子目录中;将它们放在一个不同的地方,以免将我们的测试与您的测试(可能有错误)混淆。将所有的 .in 文件放在 testing 目录中的另一个名为 student_tests 的文件夹中。在骨架中,这个文件夹是空的。

We’ve added a few things to the Makefile to adjust for differences in people’s setups. If your system’s command for invoking Python 3 is simply python, you can still use our makefile unchanged by using
我们在Makefile中添加了一些内容,以适应人们设置的差异。如果您的系统调用Python 3的命令只是 python ,您仍然可以使用我们的makefile而不需要任何更改。

make PYTHON=python check

You can pass additional flags to tester.py with, for example:
您可以通过以下方式向 tester.py 传递附加标志:

make TESTER_FLAGS="--keep --verbose"

Testing on the Staff Solution

As of Sunday February 28th, there is now a way for you to use the staff solution to verify your understanding of commands as well as verify your own tests! The guide is here.

Understanding Integration Tests

The first thing we’ll ask for in Gitbugs and when you come to receive help in Office Hours is a test that you’re failing, so it’s paramount that you learn to write tests in this project. We’ve done a lot of work to make this as painless as possible, so please take the time to read through this section so you can understand the provided tests and write good tests yourself.

The integration tests are of similar format to those from Capers. If you don’t know how the Capers integration tests (i.e. the .in files) work, then read that section from the capers spec first.
集成测试的格式与Capers的类似。如果你不知道Capers的集成测试(即 .in 文件)是如何工作的,请先阅读Capers规范中的该部分。

The provided tests are hardly comprehensive, and you’ll definitely need to write your own tests to get a full score on the project. To write a test, let’s first understand how this all works.

Here is the structure of the testing directory:
这是 testing 目录的结构

├── Makefile
├── student_tests                    <==== Your .in files will go here
├── samples                          <==== Sample .in files we provide
│   ├── test01-init.in               <==== An example test
│   ├── test02-basic-checkout.in
│   ├── test03-basic-log.in
│   ├── test04-prev-checkout.in
│   └── definitions.inc
├── src                              <==== Contains files used for testing
│   ├── notwug.txt
│   └── wug.txt
├── runner.py                        <==== Script to help debug your program
└── tester.py                        <==== Script that tests your program

Just like Capers, these tests work by creating a temporary directory within the testing directory and running the commands specified by a .in file. If you use the --keep flag, this temporary directory will remain after the test finishes so you can inspect it.
就像Capers一样,这些测试通过在 testing 目录中创建一个临时目录并运行由 .in 文件指定的命令来工作。如果使用 --keep 标志,测试完成后临时目录将保留下来,以便您进行检查。

Unlike Capers, we’ll need to deal with the contents of files in our working directory. So in this testing folder, we have an additional folder called src. This directory stores many pre-filled .txt files that have particular contents we need. We’ll come back to this later, but for now just know that src stores actual file contents. samples has the .in files of the sample tests (which are the checkpoint tests). When you create your own tests, you should add them to the student_tests folder which is initially empty in the skeleton.
与Capers不同,我们需要处理工作目录中文件的内容。所以在这个 testing 文件夹中,我们有一个额外的文件夹叫做 src 。这个目录存储了许多预填充的 .txt 文件,这些文件具有我们需要的特定内容。我们稍后会回到这个问题,但现在只需要知道 src 存储实际的文件内容。 samples 有示例测试的 .in 文件(即检查点测试)。当你创建自己的测试时,应将它们添加到 student_tests 文件夹中,该文件夹在骨架中最初是空的。

The .in files have more functions in Gitlet. Here is the explanation straight from the tester.py file:
.in 文件在Gitlet中具有更多的功能。以下是来自 tester.py 文件的解释。

# ...  A comment, producing no effect.
I FILE Include.  Replace this statement with the contents of FILE,
      interpreted relative to the directory containing the .in file.
C DIR  Create, if necessary, and switch to a subdirectory named DIR under
      the main directory for this test.  If DIR is missing, changes
      back to the default directory.  This command is principally
      intended to let you set up remote repositories.
T N    Set the timeout for gitlet commands in the rest of this test to N
      Copy the contents of src/F into a file named NAME.
      Delete the file named NAME.
      Run gitlet.Main with COMMAND ARGUMENTS as its parameters.  Compare
      its output with LINE1, LINE2, etc., reporting an error if there is
      "sufficient" discrepency.  The <<< delimiter may be followed by
      an asterisk (*), in which case, the preceding lines are treated as
      Python regular expressions and matched accordingly. The directory
      or JAR file containing the gitlet.Main program is assumed to be
      in directory DIR specifed by --progdir (default is ..).
      Check that the file named NAME is identical to src/F, and report an
      error if not.
      Check that the file NAME does not exist, and report an error if it
      Check that file or directory NAME exists, and report an error if it
      does not.
      Defines the variable VAR to have the literal value VALUE.  VALUE is
      taken to be a raw Python string (as in r"VALUE").  Substitutions are
      first applied to VALUE.

Don’t worry about the Python regular expressions thing mentioned in the above description: we’ll show you that it’s fairly straightforward and even go through an example of how to use it.

Let’s walk through a test to see what happens from start to finish. Let’s examine test02-basic-checkout.in.
让我们来进行一项测试,看看从开始到结束会发生什么。让我们来检查 test02-basic-checkout.in

Example test 示例测试

When we first run this test, a temporary directory gets created that is initially empty. Our directory structure is now:

├── Makefile
├── student_tests
├── samples
│   ├── test01-init.in
│   ├── test02-basic-checkout.in
│   ├── test03-basic-log.in
│   ├── test04-prev-checkout.in
│   └── definitions.inc
├── src
│   ├── notwug.txt
│   └── wug.txt
├── test02-basic-checkout_0          <==== Just created
├── runner.py
└── tester.py

This temporary directory is the Gitlet repository that will be used for this execution of the test, so we will add things there and run all of our Gitlet commands there as well. If you ran the test a second time without deleting the directory, it’ll create a new directory called test02-basic-checkout_1, and so on. Each execution of a test uses it’s own directory, so don’t worry about tests interfering with each other as that cannot happen.
这个临时目录是用于执行测试的Gitlet存储库,因此我们将在其中添加内容并在其中运行所有Gitlet命令。如果您在不删除目录的情况下再次运行测试,它将创建一个名为 test02-basic-checkout_1 的新目录,依此类推。每次测试的执行都使用自己的目录,所以不必担心测试之间的干扰,因为这是不可能发生的。

The first line of the test is a comment, so we ignore it.

The next section is:

> init

This shouldn’t have any output as we can tell by this section not having any text between the first line with > and the line with <<<. But, as we know, this should create a .gitlet folder. So our directory structure is now:
这应该没有任何输出,因为我们可以通过这一部分没有在第一行 > 和第二行 <<< 之间有任何文本来判断。但是,正如我们所知,这应该创建一个 .gitlet 文件夹。所以我们的目录结构现在是:

├── Makefile
├── student_tests
├── samples
│   ├── test01-init.in
│   ├── test02-basic-checkout.in
│   ├── test03-basic-log.in
│   ├── test04-prev-checkout.in
│   └── definitions.inc
├── src
│   ├── notwug.txt
│   └── wug.txt
├── test02-basic-checkout_0
│   └── .gitlet                     <==== Just created
├── runner.py
└── tester.py

The next section is:

+ wug.txt wug.txt

This line uses the + command. This will take the file on the right-hand side from the src directory and copy its contents to the file on the left-hand side in the temporary directory (creating it if it doesn’t exist). They happen to have the same name, but that doesn’t matter since they’re in different directories. After this command, our directory structure is now:
这行使用了 + 命令。它将右侧目录中的文件复制到临时目录中的左侧文件中(如果不存在,则创建)。它们恰好具有相同的名称,但这并不重要,因为它们位于不同的目录中。执行此命令后,我们的目录结构如下:

├── Makefile
├── student_tests
├── samples
│   ├── test01-init.in
│   ├── test02-basic-checkout.in
│   ├── test03-basic-log.in
│   ├── test04-prev-checkout.in
│   └── definitions.inc
├── src
│   ├── notwug.txt
│   └── wug.txt
├── test02-basic-checkout_0
│   ├── .gitlet
│   └── wug.txt                     <==== Just created
├── runner.py
└── tester.py

Now we see what the src directory is used for: it contains file contents that the tests can use to set up the Gitlet repository however you wants. If you want to add special contents to a file, you should add those contents to an appropriately named file in src and then use the same + command as we have here. It’s easy to get confused with the order of arguments, so make sure the right-hand side is referencing the file in the src directory, and the left-hand side is referencing the file in the temporary directory.
现在我们看到了 src 目录的用途:它包含了测试可以使用的文件内容,用于设置Gitlet存储库的任何方式。如果您想向文件添加特殊内容,应将这些内容添加到 src 中的相应命名文件中,然后使用与此处相同的 + 命令。参数的顺序很容易混淆,所以请确保右侧引用的是 src 目录中的文件,而左侧引用的是临时目录中的文件。

The next section is:

> add wug.txt

As you can see, there should be no output. The wug.txt file is now staged for addition in the temporary directory. At this point, your directory structure will likely change within the test02-basic-checkout_0/.gitlet directory since you’ll need to somehow persist the fact that wug.txt is staged for addition.
如您所见,不应有任何输出。文件 wug.txt 现在已经在临时目录中准备添加。此时,您的目录结构可能会在 test02-basic-checkout_0/.gitlet 目录中发生变化,因为您需要以某种方式保留 wug.txt 已准备添加的事实。

The next section is:

> commit "added wug"

And, again, there is no output, and, again, your directory strcuture within .gitlet might change.
而且,再次没有输出,而且,再次,你的 .gitlet 内的目录结构可能会改变。

The next section is:

+ wug.txt notwug.txt

Since wug.txt already exists in our temporary directory, its contents changes to be whatever was in src/notwug.txt.
由于 wug.txt 已经存在于我们的临时目录中,其内容将更改为 src/notwug.txt 中的内容。

The next section is

> checkout -- wug.txt

Which, again, has no output. However, it should change the contents of wug.txt in our temporary directory back to its original contents which is exactly the contents of src/wug.txt. The next command is what asserts that:
然后,再次没有输出。然而,它应该将我们临时目录中 wug.txt 的内容更改回其原始内容,即 src/wug.txt 的内容。下一个命令就是断言:

= wug.txt wug.txt

This is an assertion: if the file on the left-hand side (again, this is in the temporary directory) doesn’t have the exact contents of the file on the right-hand side (from the src directory), the testing script will error and say your file contents are not correct.
这是一个断言:如果左侧的文件(再次强调,这是在临时目录中)与右侧的文件(来自 src 目录)的内容不完全相同,测试脚本将报错并指出您的文件内容不正确。

There are two other assertion commands available to you:


Will assert that there exists a file/folder named NAME in the temporary directory. It doesn’t check the contents, only that it exists. If no file/folder with that name exists, the test will fail.
将断言在临时目录中存在一个名为 NAME 的文件/文件夹。它不检查内容,只检查是否存在。如果不存在该名称的文件/文件夹,测试将失败。


Will assert that there does NOT exist a file/folder named NAME in the temporary directory. If there does exist a file/folder with that name, the test will fail.
将断言在临时目录中不存在名为 NAME 的文件/文件夹。如果存在该名称的文件/文件夹,则测试将失败。

That happened to be the last line of the test, so the test finishes. If the --keep flag was provided, the temporary directory will remain, otherwise it will be deleted. You might want to keep it if you suspect your .gitlet directory is not being properly setup or there is some issue with persistence.
那恰好是测试的最后一行,所以测试结束。如果提供了 --keep 标志,临时目录将保留,否则将被删除。如果您怀疑您的 .gitlet 目录没有正确设置或存在某些持久性问题,您可能希望保留它。

Setup for a test

As you’ll soon discover, there can be a lot of repeated setup to test a particular command: for example, if you’re testing the checkout command you need to:
如您很快会发现的那样,测试特定命令可能需要大量重复的设置:例如,如果您正在测试 checkout 命令,您需要:

  1. Initialize a Gitlet Repository
  2. Create a commit with a file in some version (v1)
  3. Create another commit with that file in some other version (v2)
  4. Checkout that file to v1

And perhaps even more if you want to test with files that were untracked in the second commit but tracked in the first.

So the way you can save yourself time is by adding all that setup in a file and using the I command. Say we do that here:
所以你可以节省时间的方法是将所有的设置添加到一个文件中,并使用 I 命令。在这里我们这样做:

# Initialize, add, and commit a file.
> init
+ a.txt wug.txt
> add a.txt
> commit "a is a wug"

We should place this file with the rest of the tests in the samples directory, but with a file extension .inc, so maybe we name it samples/commit_setup.inc. If we gave it the file extension .in, our testing script will mistake it for a test and try to run it individually. Now, in our actual test, we simply use the command:
我们应该将这个文件与其他测试文件放在 samples 目录中,但是文件扩展名应该是 .inc ,所以我们可以将其命名为 samples/commit_setup.inc 。如果我们给它文件扩展名 .in ,我们的测试脚本会误将其视为一个独立的测试并尝试运行它。现在,在我们实际的测试中,我们只需使用以下命令:

I commit_setup.inc

This will have the testing script run all of the commands in that file and keep the temporary directory it creates. This keeps your tests relatively short and thus easier to read.

We’ve included one .inc file called definitions.inc that will set up patterns for your convenience. Let’s understand what patterns are.
我们已经包含了一个名为 .inc 的文件,它将为您方便地设置模式。让我们了解一下什么是模式。

Pattern matching output 模式匹配输出

The most confusing part of testing is the output for something like log. There are a few reasons why:
测试中最令人困惑的部分是像 log 这样的输出。有几个原因:

  1. The commit SHA will change as you modify your code and hash more things, so you would have to continually modify your test to keep up with the changes to the SHA.
  2. Your date will change every time since time only moves forwards.
  3. It makes the tests very long.

We also don’t really care the exact text: just that there is some SHA there and something with the right date format. For this reason, our tests use pattern matching.

This is not a concept you will need to understand, but at a high level we define a pattern for some text (i.e. a commit SHA) and then just check that the output has that pattern (without caring about the actual letters and numbers).

Here is how you’d do that for the output of log and check that it matches the pattern:
这是您对 log 的输出进行匹配模式的方法:

# First "import" the pattern defintions from our setup
I definitions.inc
# You would add your lines here that create commits with the
# specified messages. We'll omit this for this example.
> log
added wug

initial commit


The section we see is the same as a normal Gitlet command, except it ends in <<<* which tells the testing script to use patterns. The patterns are enclosed in ${PATTERN_NAME}.
我们看到的部分与普通的Gitlet命令相同,只是以 <<<* 结尾,这告诉测试脚本使用模式。模式被包含在 ${PATTERN_NAME} 中。

All the patterns are defined in samples/definitions.inc. You don’t need to understand the actual pattern, just the thing it matches. For example, HEADER matches the header of a commit which should look something like:
所有的模式都在 samples/definitions.inc 中定义。你不需要理解实际的模式,只需要知道它所匹配的内容。例如, HEADER 匹配一个提交的标题,应该是类似这样的:

commit fc26c386f550fc17a0d4d359d70bae33c47c54b9

That’s just some random commit SHA.

So when we create the expected output for this test, we’ll need to know how many entries are in this log and what the commit messages are.

You can do similar things for the status command:
您可以对 status 命令执行类似的操作

I definitions.inc
# Add commands here to setup the status. We'll omit them here.
> status
=== Branches ===

=== Staged Files ===

=== Removed Files ===

=== Modifications Not Staged For Commit ===

=== Untracked Files ===


The pattern we used here is ARBLINES which is arbitrary lines. If you actually care what is untracked, then you can add that here without the pattern, but perhaps we’re more interested in seeing g.txt staged for addition.
我们在这里使用的模式是 ARBLINES ,即任意行。如果你真的关心未跟踪的内容,那么你可以在这里添加,而不使用模式,但也许我们更感兴趣的是看到 g.txt 已准备添加。

Notice the \* on the branch master. Recall that in the status command, you should prefix the HEAD branch with a *. If you use a pattern, you’ll need to replace this * with a \* in the expected output. The reason is out of the scope of the class, but it is called “escaping” the asterisk. If you don’t use a pattern (i.e. your command ends in <<< not <<<*, then you can use the * without the \).
注意分支上的 \* 。回想一下,在 status 命令中,你应该在HEAD分支前加上 * 。如果你使用一个模式,你需要将预期输出中的 * 替换为 \* 。原因超出了课程范围,但它被称为“转义”星号。如果你不使用模式(即你的命令以 <<< 而不是 <<<* 结尾),那么你可以使用 * 而不需要 \

The final thing you can do with these patterns is “save” a matched portion. Warning: this seems like magic and we don’t care at all if you understand how this works, just know that it does and it is available to you. You can copy and paste the relevant part from our provided tests so you don’t need to worry too much about making these from scratch. With that out of the way, let’s see what this is.

If you’re doing a checkout command, you need to use the SHA identifier to specify which commit to checkout to/from. But remember we used patterns, so we don’t actually know the SHA identifier at the time of creating the test. That is problematic. We’ll use test04-prev-checkout.in to see how you can “capture” or “save” the SHA:
如果你正在执行一个 checkout 命令,你需要使用SHA标识符来指定要检出/检入的提交。但是请记住,我们使用了模式,所以在创建测试时实际上并不知道SHA标识符。这是有问题的。我们将使用 test04-prev-checkout.in 来看看如何“捕获”或“保存”SHA:

I definitions.inc
# Each ${COMMIT_HEAD} captures its commit UID.
# Not shown here, but the test sets up the log by making many commits
# with specific messages.
> log
version 2 of wug.txt

version 1 of wug.txt

initial commit


This will set up the UID (SHA) to be captured after the log command. So right after this command runs, we can use the D command to define the UIDs to variables:
这将在 log 命令之后设置UID(SHA)以被捕获。因此,在此命令运行后,我们可以使用 D 命令将UID定义为变量:

# UID of second version
D UID2 "${1}"
# UID of first version
D UID1 "${2}"

Notice how the numbering is backwards: the numbering begins at 1 and starts at the top of the log. That is why the current version (i.e. second version) is defined as "${1}". We don’t care about the initial commit, so we don’t bother capturing it’s UID.
请注意编号是倒过来的:编号从1开始,并从日志的顶部开始。这就是为什么当前版本(即第二个版本)被定义为 "${1}" 。我们不关心初始提交,所以我们不会去捕获它的UID。

Now we can use that definition to checkout to that captured SHA:

> checkout ${UID1} -- wug.txt

And now you can make your assertions to ensure the checkout was successful.

Testing conclusion 测试结论

There are many more complex things you can do with our testing script, but this is enough to write very good tests. You should use our provided tests as an example to get started, and also feel free to discuss on Ed high level ideas of how to test things. You may also share your .in files, but please make sure they’re correct before posting them and add comments so other students and staff can see what is going on.
使用我们的测试脚本,您可以做更多复杂的事情,但这已足够编写非常好的测试。您应该使用我们提供的测试作为示例来入门,并且可以自由地讨论如何测试事物的高级想法。您还可以分享您的 .in 文件,但请确保在发布之前它们是正确的,并添加注释以便其他学生和员工能够看到发生了什么。

Debugging Integration Tests

Recall from Lab 6 that debugging integration tests is a bit different with the new setup. The runner.py script will work just as it did for Capers, so you should read through that section in the Lab 6 spec and watch the video linked there. Here we describe strategies to debug:
回想一下实验6中,使用新的设置进行集成测试的调试有些不同。 runner.py

Finding the right execution to debug

Each test runs your program multiple times, and each one of them has the potential to introduce a bug. The first priority is to identify the right execution of the program that introduces the bug. What we mean by this: imagine you’re failing a test that checks the status command. Say that the output differs by just one file: you say it’s untracked, but the test says it should be staged for addition. This does not mean the status command has a bug. It’s possible that the status command is buggy, but not guaranteed. It could be that your add command didn’t properly persist the fact that a file has been staged for addition! If that is the case, then even with a fully functioning status command, your program would error.
每个测试都会多次运行您的程序,每次运行都有可能引入一个错误。首要任务是确定引入错误的程序执行方式。我们所指的是:假设您未通过检查 status 命令的测试。假设输出只有一个文件不同:您认为它是未跟踪的,但测试说它应该被标记为待添加。这并不意味着 status 命令有错误。可能 status 命令有错误,但不能保证。可能是您的 add 命令没有正确地保存文件被标记为待添加的事实!如果是这种情况,即使有一个完全正常运行的 status 命令,您的程序也会出错。

So finding the right (i.e. buggy) execution of the program is very important: how do we do that? You step through every single execution of the program using the runner.py script, and after every execution you look at your temporary directory to make sure everything has been written to a file correctly. This will be harder for serialized objects since, as we know, their contents will be a stream of unintelligable bytes: for serialized objects you can simply check that at the time of serialization they have the correct contents. You may even find that you never serialized it!

Eventually, you’ll find the bug. If you cannot, then that is when you can come to Office Hours or post a Gitbug. Be warned: we can only spend 10 minutes with each student in Office Hours, so if you have a nasty bug that you think would take a TA more than 10 minutes, then you should instead submit a Gitbug with as much information as possible. The better your Gitbug, the better/faster your response will be. Don’t forget to update your design doc: remember we will reject Gitbugs that do not have an up-to-date or complete design document.

Going Remote (Extra Credit)

This project is all about mimicking git’s local features. These are useful because they allow you to backup your own files and maintain multiple versions of them. However, git’s true power is really in its remote features, allowing collaboration with other people over the internet. The point is that both you and your friend could be collaborating on a single code base. If you make changes to the files, you can send them to your friend, and vice versa. And you’ll both have access to a shared history of all the changes either of you have made.

To get extra credit, implement some basic remote commands: namely add-remote, rm-remote, push, fetch, and pull You will get 64 extra-credit points for completing them. Don’t attempt or plan for extra credit until you have completed the rest of the project.
为了获得额外学分,请实现一些基本的远程命令:即 add-remoterm-remotepushfetchpull 。完成它们将获得64个额外学分。在完成项目的其余部分之前,请不要尝试或计划额外学分。

Depending on how flexibly you have designed the rest of the project, 64 extra-credit points may not be worth the amount of effort it takes to do this section. We’re certainly not expecting everyone to do it. Our priority will be in helping students complete the main project; if you’re doing the extra credit, we expect you to be able to stand on your own a little bit more than most students.

The Commands 命令

A few notes about the remote commands:

So now let’s go over the commands:

add-remote 添加远程

rm-remote 删除远程


fetch 取回


I. Things to Avoid
一. 需要避免的事情

There are few practices that experience has shown will cause you endless grief in the form of programs that don’t work and bugs that are very hard to find and sometimes not repeatable (“Heisenbugs”).

  1. Since you are likely to keep various information in files (such as commits), you might be tempted to use apparently convenient file-system operations (such as listing a directory) to sequence through all of them. Be careful. Methods such as File.list and File.listFiles produce file names in an undefined order. If you use them to implement the log command, in particular, you can get random results.
    由于您可能会在文件中保存各种信息(例如提交),您可能会倾向于使用看似方便的文件系统操作(例如列出目录)来顺序处理它们。但请小心。诸如 File.listFile.listFiles 等方法会以未定义的顺序生成文件名。特别是如果您使用它们来实现 log 命令,可能会得到随机结果。

  2. Windows users especially should beware that the file separator character is / on Unix (or MacOS) and ‘\’ on Windows. So if you form file names in your program by concatenating some directory names and a file name together with explicit /s or \s, you can be sure that it won’t work on one system or the other. Java provides a system-dependent file separator character (System.getProperty("file.separator")), or you can use the multi-argument constructors to File.
    Windows用户尤其要注意,在Unix(或MacOS)上文件分隔符是 / ,而在Windows上是'\'。因此,如果您在程序中通过将一些目录名和文件名与明确的 / s或 \ s连接起来来形成文件名,您可以确保它在其中一个系统上无法工作。Java提供了一个系统相关的文件分隔符字符( System.getProperty("file.separator") ),或者您可以使用多参数构造函数来 File
  3. Be careful using a HashMap when serializing! The order of things within the HashMap is non-deterministic. The solution is to use a TreeMap which will always have the same order. More details here
    使用序列化时要小心!在 HashMap 中的事物顺序是不确定的。解决方案是使用 TreeMap ,它的顺序始终相同。更多详细信息请参见此处。

J. Acknowledgments J. 致谢

Thanks to Alicia Luengo, Josh Hug, Sarah Kim, Austin Chen, Andrew Huang, Yan Zhao, Matthew Chow, especially Alan Yao, Daniel Nguyen, and Armani Ferrante for providing feedback on this project. Thanks to git for being awesome.
感谢Alicia Luengo、Josh Hug、Sarah Kim、Austin Chen、Andrew Huang、Yan Zhao、Matthew Chow,特别感谢Alan Yao、Daniel Nguyen和Armani Ferrante对这个项目提供的反馈。感谢git的出色表现。

This project was largely inspired by [this][Nilsson Article] excellent article by Philip Nilsson.
这个项目在很大程度上受到Philip Nilsson的[这篇][Nilsson Article]优秀文章的启发。

This project was created by Joseph Moghadam. Modifications for Fall 2015, Fall 2017, and Fall 2019 by Paul Hilfinger.
该项目由Joseph Moghadam创建。Paul Hilfinger对2015年秋季、2017年秋季和2019年秋季进行了修改。