Writing a package manager
编写一个软件包管理器
Writing a package manager is not one of the most common programming tasks. After all, there are many out-of-the-box ones available. Yet, somehow I've found myself in exactly this situation.
编写软件包管理器并不是最常见的编程任务之一。毕竟,有许多现成的软件包管理器可用。然而,不知何故,我发现自己正处于这种情况。
How so? 怎么样?
I'm a big fan of SQLite and its extensions. Given the large number of such extensions in the wild, I wanted a structured approach to managing them. Which usually involves, well, a package manager. Except there is none for SQLite. So I decided to build one!
我是 SQLite 及其扩展的忠实粉丝。鉴于野外存在大量此类扩展,我希望有一种结构化的方法来管理它们。通常涉及到,嗯,一个包管理器。但 SQLite 没有这样的包管理器。所以我决定自己构建一个!If you haven't seen them before, SQLite extensions are just libraries (
.dll
,.dylib
or.so
depending on the operating system). To make an extension work, you download it and load it into SQLite.
如果您以前没有看到过它们,SQLite 扩展只是库(.dll
,.dylib
或.so
取决于操作系统)。要使扩展工作,您需要下载它并加载到 SQLite 中。
Needless to say, building a package manager is not an easy task. In fact, Sam Boyer has written a great article about the problems involved. So I won't going to dwell on it.
毋庸置疑,构建一个软件包管理器并非易事。事实上,Sam Boyer 已经写了一篇关于涉及问题的精彩文章。所以我不会深究这个问题。
This article explains the design choices and implementation details that allowed me to actually build a working package manager in a couple of weeks (mostly evenings and nights, to be honest). I tried to leave out most of the SQLite specifics, so hopefully you can apply this approach to any package manager should you decide to build one.
本文解释了设计选择和实现细节,这些选择和细节使我能够在几周内(主要是晚上和夜晚)实际构建一个工作的软件包管理器。我尽量避免提及大部分 SQLite 的具体内容,希望您如果决定构建一个软件包管理器,可以应用这种方法。
Design decisions 设计决策
Package management is complex by nature, and there is no magic bullet that will make it simple (unless you are willing to radically narrow the scope). So let's go through the building blocks together, tackling problems as they arise.
软件包管理本质上是复杂的,没有什么灵丹妙药可以让它变得简单(除非你愿意大幅缩小范围)。因此,让我们一起逐步了解构建模块,解决问题随之而来。
Spec file •
Folder structure •
Scope •
Registry •
Version •
Latest version •
Lockfile •
Source of truth •
Checksums •
Dependencies •
Install and update
规范文件 • 文件夹结构 • 范围 • 注册表 • 版本 • 最新版本 • 锁定文件 • 真相来源 • 校验和 • 依赖项 • 安装和更新
Spec file 规格文件
To work with packages, the manager needs some information about them. At least the package ID and the download location. So let's design a package spec file that describes a package.
要使用软件包,管理器需要一些关于它们的信息。至少需要软件包 ID 和下载位置。因此,让我们设计一个描述软件包的软件包规范文件。
Here is a simple one:
这是一个简单的例子:
{
"owner": "sqlite",
"name": "stmt",
"assets": {
"path": "https://github.com/nalgeon/sqlean/releases/download/incubator",
"files": {
"darwin-amd64": "stmt.dylib",
"darwin-arm64": "stmt.dylib",
"linux-amd64": "stmt.so",
"windows-amd64": "stmt.dll"
}
}
}
owner
+ name
form a unique package identifier (we don't want any name conflicts, thank you very much, Python).所有者
+ 名称
形成一个唯一的包标识符(我们不希望有任何名称冲突,非常感谢,Python)。
The assets.path
is a base URL for the package assets. The assets themselves are listed in the assets.files
. When the manager installs the package, it chooses the asset name according to the user's operating system, combines it with the assets.path
and downloads the asset.assets.path
是包资产的基本 URL。资产本身列在assets.files
中。当管理器安装包时,根据用户的操作系统选择资产名称,将其与assets.path
组合,并下载资产。
> install sqlite/stmt
↓
download spec
┌───────────────┐
│ sqlite/stmt │
└───────────────┘
↓
check platform
↪ OS: darwin
↪ arch: arm64
↓
download asset
┌───────────────┐
│ stmt.dylib │
└───────────────┘
Good start! 好的开始!
Folder structure 文件夹结构
Let's say there is a package hosted somewhere on GitHub. I tell the manager (called sqlpkg
from now on) to install it:
让我们假设有一个托管在 GitHub 某处的软件包。我告诉经理(现在称为 sqlpkg
)安装它:
sqlpkg install sqlite/stmt
The manager downloads the package and stores it locally in a folder named .sqlpkg
:
经理下载包并将其存储在名为.sqlpkg
的本地文件夹中:
.sqlpkg
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
(sqlpkg.json
is the spec file and stmt.dylib
is the package asset)
(sqlpkg.json
是规范文件,stmt.dylib
是包资产)
Let's add another one: 让我们再加一个:
sqlpkg install asg017/vss
.sqlpkg
├── asg017
│ └── vss
│ ├── sqlpkg.json
│ └── vss0.dylib
│
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
As you can probably see, given this folder structure, it's quite easy for the manager to reason about the installed packages.
正如您可能看到的那样,考虑到这个文件夹结构,经理很容易理清已安装的软件包。
For example, if I run sqlpkg update OWNER/NAME
, it does the following:
例如,如果我运行sqlpkg update OWNER/NAME
,它会执行以下操作:
- Reads the spec file from the path
.sqlpkg/OWNER/NAME/sqlpkg.json
.
从路径.sqlpkg/OWNER/NAME/sqlpkg.json
读取规范文件。 - Downloads the latest asset using the
assets.path
from the spec.
使用规范中的assets.path
下载最新资产。 - Replaces the old
.dylib
with the new one.
替换旧的.dylib
为新的。
When I run sqlpkg uninstall OWNER/NAME
, it deletes the corresponding directory.
当我运行sqlpkg uninstall OWNER/NAME
时,它会删除相应的目录。
And when I run sqlpkg list
, it searches for all paths that match .sqlpkg/*/*/sqlpkg.json
.
当我运行sqlpkg list
时,它会搜索所有与.sqlpkg/*/*/sqlpkg.json
匹配的路径。
Simple, isn't it? 简单,是吧?
Project vs. global scope 项目范围 vs. 全局范围
Some package managers (e.g. npm
) use per-project scope by default, but also allow you to install packages globally using flags (npm install -g
). Others (e.g. brew
) use global scope.
一些软件包管理器(例如npm
)默认使用项目范围,但也允许您使用标志(npm install -g
)全局安装软件包。其他一些(例如brew
)使用全局范围。
I like the idea of allowing both project and global scope, but I do not like the flags approach. Why don't we apply a heuristic:
我喜欢允许项目和全局范围的想法,但我不喜欢标志的方法。为什么不应用一种启发式方法:
- If there is a
.sqlpkg
folder in the current directory, use project scope.
如果当前目录中有一个.sqlpkg
文件夹,请使用项目范围。 - Otherwise, use global scope.
否则,请使用全局范围。
This way, if users don't need separate project environments, they will just run sqlpkg
as is and install packages in their home folder (e.g. ~/.sqlpkg
). Otherwise, they'll create a separate .sqlpkg
for each project (we can provide a helper init
command for this).
这样,如果用户不需要单独的项目环境,他们只需按原样运行sqlpkg
并在其主目录中安装软件包(例如~/.sqlpkg
)。否则,他们将为每个项目创建一个单独的.sqlpkg
(我们可以为此提供一个辅助init
命令)。
Project scope: 项目范围:
$ cd /my/project
$ sqlpkg init
$ sqlpkg install sqlite/stmt
$ tree .sqlpkg
.sqlpkg
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
Global scope: 全球范围:
$ cd /some/other/path
$ sqlpkg install sqlite/stmt
$ tree ~/.sqlpkg
/Users/anton/.sqlpkg
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
No flags involved! 没有涉及到任何旗帜!
Package registry 软件包注册表
For a package manager to be useful, it should support existing extensions (which, of course, are completely unaware of it at the moment). Maybe extension authors will eventually add spec files to their packages, maybe they won't — we can't rely on that.
为了使软件包管理器有用,它应该支持现有的扩展(当然,这些扩展目前完全不知道它)。也许扩展作者最终会向他们的软件包添加规范文件,也许不会 — 我们不能依赖于此。
So let's add a simple fallback algorithm. When the user runs sqlpkg install OWNER/NAME
, the manager does the following:
因此,让我们添加一个简单的备用算法。当用户运行sqlpkg install OWNER/NAME
时,管理器执行以下操作:
- Attempts to fetch the spec from the owner's GitHub repo
https://github.com/OWNER/NAME
.
尝试从所有者的 GitHub 存储库https://github.com/OWNER/NAME
获取规范。 - If the spec is not found, fetches it from the package registry.
如果未找到规范,则从软件包注册表中获取。
owner/name
↓
┌─────────────────┐ found ┌───────────┐
│ owner's repo │ → │ install │
└─────────────────┘ └───────────┘
↓ not found
┌─────────────────┐ found ┌───────────┐
│ pkg registry │ → │ install │
└─────────────────┘ └───────────┘
↓ not found
✗ error
The package registry is just another GitHub repo with a two-level owner/name structure:
软件包注册表只是另一个具有两级所有者/名称结构的 GitHub 存储库:
pkg/
├── asg017
│ ├── fastrand.json
│ ├── hello.json
│ ├── html.json
│ └── ...
├── daschr
│ └── cron.json
├── dessus
│ ├── besttype.json
│ ├── fcmp.json
│ └── ...
├── ...
...
We'll bootstrap the registry with known packages, so the manager will work right out of the box. As package authors catch up and add sqlpkg.json
to their repos, the manager will gradually switch to using them instead of the registry.
我们将使用已知的软件包来引导注册表,因此管理器将立即投入使用。随着软件包作者赶上并将sqlpkg.json
添加到他们的存储库中,管理器将逐渐转而使用它们,而不是注册表。
The manager should also support links to specific GitHub repos (in case the repo has a different name than the package):
经理还应支持到特定 GitHub 存储库的链接(如果存储库的名称与包不同):
sqlpkg install github.com/asg017/sqlite-vss
And other URLs, because not everyone uses GitHub:
其他网址,因为并非每个人都使用 GitHub:
sqlpkg install https://antonz.org/downloads/stats.json
And also local paths: 并且还有本地路径:
sqlpkg install ./stats.json
All this "locator" logic complicates the design quite a bit. So if you are comfortable with requiring package authors to provide the specs, feel free to omit the fallback step and the registry altogether.
所有这些“定位器”逻辑都让设计变得相当复杂。因此,如果您愿意要求软件包作者提供规范,请随意省略后备步骤和注册表。
Version 版本
What a package without a version, right? Let's add it:
没有版本的软件包怎么行呢?让我们来添加吧:
{
"owner": "asg017",
"name": "vss",
"version": "v0.1.1",
"repository": "https://github.com/asg017/sqlite-vss",
"assets": {
"path": "{repository}/releases/download/{version}",
"files": {
"darwin-amd64": "vss-{version}-macos-x86_64.tar.gz",
"darwin-arm64": "vss-{version}-macos-aarch64.tar.gz",
"linux-amd64": "vss-{version}-linux-x86_64.tar.gz"
}
}
}
We also introduced variables like {repository}
and {version}
so package authors don't have to repeat themselves.
我们还引入了像{repository}
和{version}
这样的变量,这样包的作者就不必重复自己了。
When updating a package, the manager must now compare local and remote versions according to the semantic versioning rules:
在更新软件包时,管理者现在必须根据语义版本规则比较本地和远程版本:
local spec │ remote spec
│
> update │
┌─────────────┐ │ ┌─────────────┐
│ v0.1.0 │ < │ v0.1.1 │
└─────────────┘ │ └─────────────┘
↓ │
updating... │
┌─────────────┐ │
│ v0.1.1 │ │
└─────────────┘ │
Nice! 好的!
Latest version 最新版本
While not required, it would be nice to support the latest
version placeholder and automatically resolve it via API for GitHub-hosted packages:
虽然不是必需的,但支持latest
版本占位符并通过 API 自动解析 GitHub 托管的软件包将是不错的:
{
"owner": "asg017",
"name": "vss",
"version": "latest",
"repository": "https://github.com/asg017/sqlite-vss",
"assets": {
"path": "{repository}/releases/download/{version}",
"files": {
"darwin-amd64": "vss-{version}-macos-x86_64.tar.gz",
"darwin-arm64": "vss-{version}-macos-aarch64.tar.gz",
"linux-amd64": "vss-{version}-linux-x86_64.tar.gz"
}
}
}
This way, package authors don't have to change the spec when releasing a new version. When installing the package, the manager will fetch the latest version from GitHub:
这样,包作者在发布新版本时无需更改规范。安装包时,管理器将从 GitHub 获取最新版本:
local spec │ remote spec │ github api
│ │
> update │ │
┌─────────────┐ │ │
│ v0.1.0 │ │ │
└─────────────┘ │ │
↓ │ │
wait a sec... │ │
┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐
│ v0.1.0 │ ? │ latest │ → │ v0.1.1 │
└─────────────┘ │ └─────────────┘ │ └─────────────┘
↓ │ │
┌─────────────┐ │ ┌─────────────┐ │
│ v0.1.0 │ < │ v0.1.1 │ │
└─────────────┘ │ └─────────────┘ │
↓ │ │
updating... │ │
┌─────────────┐ │ │
│ v0.1.1 │ │ │
└─────────────┘ │ │
In this scenario, it's important to store the specific version in the local spec, not the "latest" placeholder. Otherwise, the manager won't be able to reason about the currently installed version when the user runs an update
command.
在这种情况下,将特定版本存储在本地规范中非常重要,而不是“latest”占位符。否则,当用户运行update
命令时,管理者将无法推理出当前安装的版本。
Lockfile 锁定文件
Having an .sqlpkg
folder with package specs and assets is enough to implement all manager commands. We can install, uninstall, update and list packages based on the .sqlpkg
data only.
拥有一个.sqlpkg
文件夹,其中包含包规范和资产就足以实现所有管理器命令。我们可以根据.sqlpkg
数据安装、卸载、更新和列出软件包。
.sqlpkg
├── asg017
│ └── vss
│ ├── sqlpkg.json
│ └── vss0.dylib
│
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
But what if the user wants to reinstall the packages on another machine or CI server? That's where the lockfile comes in.
但是如果用户想要在另一台机器或 CI 服务器上重新安装软件包呢?这就是lockfile的用武之地。
The lockfile stores a list of all installed packages with just enough information to reinstall them if needed:
锁定文件存储了所有已安装软件包的列表,仅提供重新安装所需的足够信息:
{
"packages": {
"asg017/vss": {
"owner": "asg017",
"name": "vss",
"version": "v0.1.1",
"specfile": "https://github.com/nalgeon/sqlpkg/raw/main/pkg/asg017/vss.json",
"assets": {
// ...
}
},
"sqlite/stmt": {
"owner": "sqlite",
"name": "stmt",
"version": "",
"specfile": "https://github.com/nalgeon/sqlpkg/raw/main/pkg/sqlite/stmt.json",
"assets": {
// ...
}
}
}
}
The only new field here is the specfile
— it's a path to a remote spec file to fetch the rest of the package information (e.g. description, license, and authors).
这里唯一的新字段是specfile
— 它是一个远程规范文件的路径,用于获取包的其余信息(例如描述、许可证和作者)。
Now the user can commit the lockfile along with the rest of the project, and run install
on another machine to install all the packages listed in the lockfile:
现在用户可以将锁定文件与项目的其余部分一起提交,并在另一台机器上运行install
来安装锁定文件中列出的所有软件包:
local spec │ lockfile │ remote spec
│ │
> install │ ┌─────────────┐ │ ┌─────────────┐
└─ (empty) → │ asg017/vss │ → │ asg017/vss │
│ │ sqlite/stmt │ │ └─────────────┘
│ └─────────────┘ │ ┌─────────────┐
┌─ ← ← │ sqlite/stmt │
installing... │ │ └─────────────┘
┌─────────────┐ │ │
│ asg017/vss │ │ │
└─────────────┘ │ │
┌─────────────┐ │ │
│ sqlite/stmt │ │ │
└─────────────┘ │ │
So far, so good. 迄今为止,一切顺利。
Source of truth 真相来源
Lockfile sounds like a no-brainer, but in fact it introduces a major problem — we no longer have a single source of truth for any given local package.
锁定文件听起来像是一个不用大脑思考的决定,但实际上它引入了一个重大问题 — 我们不再有任何给定本地包的单一真相来源。
Let's consider one of the simpler commands — list
, which displays all installed packages. Previously, all it had to do was scan the .sqlpkg
for spec files:
让我们考虑其中一个更简单的命令 - list
,它显示所有已安装的软件包。以前,它所要做的就是扫描.sqlpkg
以查找规范文件:
> list
↓
glob .sqlpkg/*/*/sqlpkg.json
┌─────────────┐
│ asg017/vss │
│ sqlite/stmt │
└─────────────┘
But now we have two sources of package information — the .sqlpkg
folder and the lockfile. Imagine that for some reason they are out of sync:
但现在我们有两个软件包信息源 - .sqlpkg
文件夹和锁定文件。想象一下,由于某种原因,它们不同步:
local spec │ lockfile
│
> list │
↓ │
let's see... │
┌─────────────┐ │ ┌──────────────┐
│ asg017/vss │ │ │ asg017/vss │
└─────────────┘ │ │ nalgeon/text │
┌─────────────┐ │ └──────────────┘
│ sqlite/stmt │ │
└─────────────┘ │
↓ │
???
Instead of the simple "just list the .sqlpkg contents", we now have 4 possible situations for any given package:
与简单的“只列出 .sqlpkg 内容”不同,现在对于任何给定的软件包,我们有 4 种可能的情况:
- The package is listed in both .sqlpkg and the lockfile with the same version.
该软件包在.sqlpkg 和锁定文件中列出的版本相同。 - The package is listed in both .sqlpkg and the lockfile, but the versions are different.
该软件包在.sqlpkg 和锁定文件中都有列出,但版本不同。 - The package is listed in .sqlpkg, but not in the lockfile.
软件包在 .sqlpkg 中列出,但不在锁定文件中。 - The package is listed in the lockfile, but not in .sqlpkg.
包在锁定文件中列出,但不在.sqlpkg 中。
➊ is easy, but what should the manager do with ➋, ➌ and ➍?
➊很容易,但经理应该如何处理➋、➌和➍?
Instead of coming up with clever conflict resolution strategies, let's establish the following ground rule:
不要想出聪明的冲突解决策略,让我们建立以下基本规则:
There is a single source of truth, and it's the contents of the .sqlpkg folder.
有一个真相的唯一来源,那就是 .sqlpkg 文件夹的内容。
This immediately solves all kinds of lockfile related problems. For the list
command example, we now only look in .sqlpkg
(as we did before) and then synchronize the lockfile with it, adding the missing packages if necessary:
这立即解决了所有与 lockfile 相关的问题。对于list
命令示例,我们现在只查看.sqlpkg
(与之前一样),然后与之同步 lockfile,如有必要,添加缺失的软件包:
local spec │ lockfile
│
> list │
↓ │
glob .sqlpkg/*/*/sqlpkg.json
┌─────────────┐ │ ┌─────────────┐
│ asg017/vss │ │ │ does not │
└─────────────┘ │ │ matter │
┌─────────────┐ │ └─────────────┘
│ sqlite/stmt │ │
└─────────────┘ │
↓ │
sync the lockfile
┌─────────────┐ │ ┌─────────────┐
│ asg017/vss │ → │ asg017/vss │
│ sqlite/stmt │ │ │ sqlite/stmt │
└─────────────┘ │ └─────────────┘
Phew. 呼。
Checksums 校验和
So far we've assumed that there will be no problems downloading remote package assets to the user's machine. And indeed, in most cases there will be none. But just to be sure that everything was downloaded correctly, we'd better check the asset's checksum.
到目前为止,我们假设将远程包资产下载到用户的计算机上不会出现问题。实际上,在大多数情况下不会出现问题。但为了确保一切都已正确下载,我们最好检查资产的校验和。
Calculating the actual checksum of the downloaded asset is easy — we'll just use the SHA-256 algorithm. But we also need a value to compare it to — the expected asset checksum.
计算已下载资产的实际校验和很容易 — 我们只需使用 SHA-256 算法。但我们还需要一个值来进行比较 — 预期的资产校验和。
We can specify checksums right in the package spec file:
我们可以在软件包规范文件中指定校验和:
{
"owner": "asg017",
"name": "vss",
"version": "v0.1.1",
"repository": "https://github.com/asg017/sqlite-vss",
"assets": {
"path": "https://github.com/asg017/sqlite-vss/releases/download/v0.1.1",
"files": {
"darwin-amd64": "vss-macos-x86_64.tar.gz",
"darwin-arm64": "vss-macos-aarch64.tar.gz",
"linux-amd64": "vss-linux-x86_64.tar.gz"
},
"checksums": {
"vss-macos-x86_64.tar.gz": "sha256-a3694a...",
"vss-macos-aarch64.tar.gz": "sha256-04dc3c...",
"vss-linux-x86_64.tar.gz": "sha256-f9cc84..."
}
}
}
But this would require the package author to edit the spec after each release, since the checksums are not known in advance.
但这将需要软件包作者在每次发布后编辑规范,因为校验和事先不知道。
It's much better to provide the checksums in a separate file (e.g. checksums.txt
) that is auto-generated with each new release. Such a file is hosted along with other package assets:
更好的做法是在一个单独的文件中提供校验和(例如 checksums.txt
),该文件会随着每个新版本的发布而自动生成。这样的文件与其他软件包资产一起托管:
https://github.com/asg017/sqlite-vss/releases/download/v0.1.1
├── checksums.txt
├── vss-macos-x86_64.tar.gz
├── vss-macos-aarch64.tar.gz
└── vss-linux-x86_64.tar.gz
When installing the package, the manager fetches checksums.txt
, injects it into the local spec file, and validates the downloaded asset checksum against the expected value:
在安装软件包时,管理器会获取checksums.txt
,将其注入到本地规范文件中,并验证下载的资产校验和是否与预期值匹配:
local assets │ local spec │ remote assets
│ │
> install │ │ ┌──────────────────┐
└─ (empty) → (empty) → │ asg017/vss │
│ │ ├──────────────────┤
│ │ │ checksums.txt │
┌─ ← ┌─ ← │ macos-x86.tar.gz │
download asset │ save spec w/checksums │ └──────────────────┘
┌──────────────────┐ │ ┌──────────────────┐ │
│ macos-x86.tar.gz │ │ │ asg017/vss │ │
└──────────────────┘ │ ├──────────────────┤ │
↓ │ │ macos-x86.tar.gz │ │
calculate checksum │ │ sha256-a3694a... │ │
┌──────────────────┐ │ └──────────────────┘ │
│ sha256-a3694a... │ │ │
└──────────────────┘ │ │
↓ │ │
verify checksum │ │
↪ ✗ abort if failed │ asg017/vss │
┌──────────────────┐ │ ┌──────────────────┐ │
│ macos-x86.tar.gz │ │ │ macos-x86.tar.gz │ │
│ sha256-a3694a... │ = │ sha256-a3694a... │ │
└──────────────────┘ │ └──────────────────┘ │
↓ │ │
install asset │ │
┌──────────────────┐ │ │
│ vss0.dylib │ │ │
└──────────────────┘ │ │
✓ done!
If the remote package is missing checksums.txt
, the manager can warn the user or even refuse to install such a package.
如果远程包缺少checksums.txt
,管理器可以警告用户甚至拒绝安装这样的包。
Package dependencies 软件包依赖
Okay, it's time to talk about the elephant in the room — package dependencies.
好的,现在是时候谈谈房间里的大象了 —— 包依赖关系。
A package dependency is when package A depends on package B:
一个软件包依赖关系是指软件包 A 依赖于软件包 B:
┌─────┐ ┌─────┐
│ A │ ──> │ B │
└─────┘ └─────┘
A transitive dependency is when package A depends on B, and B depends on C, so A depends on C:
一个传递依赖是指当包 A 依赖于 B,而 B 依赖于 C 时,A 也依赖于 C:
┌─────┐ ┌─────┐ ┌─────┐
│ A │ ──> │ B │ ──> │ C │
└─────┘ └─────┘ └─────┘
Dependencies, especially transitive ones, are a major headache (read Sam's article if you don't believe me). Fortunately, in the SQLite world, extensions are usually self-contained and don't depend on other extensions. So getting rid of the dependency feature is an obvious choice. It radically simplifies things.
依赖关系,尤其是传递性依赖,是一个主要的头痛(如果你不相信我,请阅读 Sam 的文章)。幸运的是,在 SQLite 世界中,扩展通常是自包含的,不依赖于其他扩展。因此,摆脱依赖特性是一个明显的选择。这会极大地简化事情。
┌─────┐ ┌─────┐ ┌─────┐
│ A │ │ B │ │ C │
└─────┘ └─────┘ └─────┘
Since all packages are independent, the manager can install and update them individually without worrying about version conflicts.
由于所有软件包都是独立的,经理可以单独安装和更新它们,而不必担心版本冲突。
I understand that dropping dependencies altogether may not be something you are ready to accept. But all the other building blocks we've discussed are still relevant regardless of the dependency handling strategy, so let's leave it at that.
我明白完全放弃依赖可能不是您准备接受的事情。但是,无论依赖处理策略如何,我们讨论过的所有其他构建模块仍然相关,所以就让它保持现状吧。
Install and update 安装和更新
Now that we've seen all the building blocks, let's look at two of the most complex commands: install
and update
.
现在我们已经看到了所有的构建块,让我们来看看两个最复杂的命令:install
和 update
。
Suppose I tell the manager to install the asg017/vss
package:
假设我告诉经理安装asg017/vss
包:
local spec │ lockfile │ remote spec
│ │
> install asg017/vss │
↓ │ │
read remote spec │ │ ┌─────────────┐
└─ → → → │ asg017/vss │
│ │ │ latest │
│ │ └─────────────┘
│ │ ↓
│ │ resolve version
│ │ ┌─────────────┐
│ │ │ asg017/vss │
┌─ ← ← ← │ v0.1.0 │
download spec │ │ └─────────────┘
┌─────────────┐ │ │
│ asg017/vss │ │ │
│ v0.1.0 │ │ │
└─────────────┘ │ │
↓ │ │
download assets │ │
validate checksums │
↪ ✗ abort if failed │
↓ │ │
install assets │ │
┌─────────────┐ │ │
│ vss0.dylib │ │ │
└─────────────┘ │ │
└─ → add to lockfile │
│ ┌─────────────┐ │
│ │ asg017/vss │ │
│ │ v0.1.0 │ │
│ └─────────────┘ │
✓ done!
Now let's say I heard there was a new release, so I tell the manager to update the package:
现在假设我听说有一个新版本发布了,所以我告诉经理更新软件包:
local spec │ lockfile │ remote spec
│ │
> update asg017/vss │
↓ │ │
read local spec │ │
↪ abort if failed │
┌─────────────┐ │ ┌─────────────┐ │
│ asg017/vss │ │ │ does not │ │
│ v0.1.0 │ │ │ matter │ │
└─────────────┘ │ └─────────────┘ │
↓ │ │
read remote spec │ │
resolve version │ │ ┌─────────────┐
└─ → → → │ asg017/vss │
┌─ ← ← ← │ v0.1.1 │
has new version? │ │ └─────────────┘
↪ ✗ abort if not │
┌─────────────┐ │ │ ┌─────────────┐
│ v0.1.0 │ < is less than < │ v0.1.1 │
└─────────────┘ │ │ └─────────────┘
↓ │ │
download assets │ │
validate checksums │
↪ ✗ abort if failed │
↓ │ │
install assets │ │
add to lockfile │ │
┌─────────────┐ │ ┌─────────────┐ │
│ asg017/vss │ → │ asg017/vss │ │
│ v0.1.1 │ │ │ v0.1.1 │ │
└─────────────┘ │ └─────────────┘ │
┌─────────────┐ │ │
│ vss0.dylib │ │ │
└─────────────┘ │ │
✓ done!
Not so complicated after all, huh?
原来并不那么复杂,是吧?
Implementation details 实施细节
I've written the package manager in Go. I believe Go is a great choice: not only is it reasonably fast and compiles to native code, but it's also the simplest of the mainstream languages. So I think you'll be able to easily follow the code even if you don't know Go. Also, porting the code to another language should not be a problem.
我已经用 Go 编写了包管理器。我相信 Go 是一个很好的选择:它不仅速度相当快,可以编译成本地代码,而且也是主流语言中最简单的之一。所以我认为,即使你不懂 Go,你也能轻松地跟踪代码。此外,将代码移植到另一种语言也不应该是问题。
Another benefit of using Go is it's well thought out standard library. It allowed me to implement the whole project with zero dependencies, which is always nice.
使用 Go 的另一个好处是它精心设计的标准库。这使我能够在零依赖的情况下实现整个项目,这总是很好的。
spec •
assets •
checksums •
lockfile •
cmd •
top-level
规范 • 资产 • 校验和 • 锁定文件 • 命令 • 顶层
spec
package
spec
包
The spec
package provides data structures and functions related to the spec file.spec
包提供了与规范文件相关的数据结构和函数。
spec
┌─────────────────────────────────────┐
│ Package{} Read() Dir() │
│ Assets{} ReadLocal() Path() │
│ AssetPath{} ReadRemote() │
└─────────────────────────────────────┘
The spec file and its associated data structures are the heart of the system:
规范文件及其关联的数据结构是系统的核心:
// A Package describes the package spec.
type Package struct {
Owner string
Name string
Version string
Homepage string
Repository string
Specfile string
Authors []string
License string
Description string
Keywords []string
Symbols []string
Assets Assets
}
// Assets are archives of package files, each for a specific platform.
type Assets struct {
Path *AssetPath
Pattern string
Files map[string]string
Checksums map[string]string
}
// An AssetPath describes a local file path or a remote URL.
type AssetPath struct {
Value string
IsRemote bool
}
We've already discussed the most important Package
fields in the Design section. The rest (Homepage
, Authors
, License
, and so on) provide additional package metadata.
我们已经在设计部分讨论了最重要的Package
字段。其余的(Homepage
,Authors
,License
等)提供了额外的软件包元数据。
The Package
structure provides the basic spec management methods:Package
结构提供了基本的规范管理方法:
ExpandVars
substitutes variables inAssets
with real values.ExpandVars
用实际值替换Assets
中的变量。ReplaceLatest
forces a specific package version instead of the "latest" placeholder.ReplaceLatest
强制使用特定的软件包版本,而不是使用“latest”占位符。AssetPath
determines the asset url for a specific platform (OS + architecture).AssetPath
确定特定平台(操作系统 + 架构)的资产 URL。Save
writes the package spec file to the specified directory.保存
将包规范文件写入指定目录。
Assets.Pattern
provides a way to selectively extract files from the archive. It accepts a glob pattern. For example, if the package asset contains many libraries, and we only want to extract the text
one, the Assets.Pattern
would be text.*
.Assets.Pattern
提供了一种从存档中选择性提取文件的方法。它接受一个 glob 模式。例如,如果包含许多库的包资产,而我们只想提取text
一个,Assets.Pattern
将是text.*
。
The Read
family of functions loads the package spec file from the specified path (either local or remote).Read
函数族从指定路径(本地或远程)加载包规范文件。
Finally, the Dir
and Path
functions return the directory and spec file path of the installed package.
最后,Dir
和 Path
函数返回已安装包的目录和规范文件路径。
assets
package
资产
包
The assets
package provides functions for managing the actual package assets.资产
包提供了管理实际包资产的功能。
assets
┌──────────────────────┐
│ Asset{} Download() │
│ Copy() │
│ Unpack() │
└──────────────────────┘
An Asset
is a binary file or an archive of package files for a particular platform:
一个资产
是特定平台的二进制文件或包文件的归档:
type Asset struct {
Name string
Path string
Size int64
Checksum []byte
}
The Asset
provides a Validate
method that checks the asset's checksum against the provided checksum string.资产
提供了一个验证
方法,用于检查资产的校验和与提供的校验和字符串是否匹配。
The Download
, Copy
and Unpack
package-level functions perform corresponding actions on the asset.下载
,复制
和解压
包级函数对资产执行相应操作。
The assets
and spec
packages are independent, but both are used by a higher level cmd
package, which we'll discuss later.资产
和规范
包是独立的,但两者都被一个更高级别的cmd
包使用,我们稍后会讨论。
checksums
package
校验和
软件包
The checksums
package has one job — it loads asset checksums from a file (checksums.txt
) into a map (which can be assigned to the spec.Package.Assets.Checksums
field).checksums
包有一个任务 — 从文件(checksums.txt
)加载资产校验和到一个映射中(可以分配给spec.Package.Assets.Checksums
字段)。
checksums
┌──────────┐
│ Exists() │
│ Read() │
└──────────┘
Exists
checks if a checksum file exists in the given path. Read
loads checksums from a local or remote file into a map, where keys are filenames and values are checksums. Pretty simple stuff.存在
检查给定路径中是否存在校验和文件。 读取
从本地或远程文件加载校验和到映射中,其中键是文件名,值是校验和。相当简单的东西。
Similar to assets
, checksums
and spec
packages are independent, but both are used by a higher level cmd
package.
类似于assets
包,checksums
和spec
包是独立的,但两者都被更高级别的cmd
包使用。
lockfile
package
lockfile
软件包
Just like the spec
package works with the spec file, the lockfile
works with the lockfile.
就像spec
包与规范文件一起工作一样,lockfile
与锁定文件一起工作。
lockfile
┌──────────────────────────┐
│ Lockfile{} ReadLocal() │
│ Path() │
└──────────────────────────┘
A Lockfile
describes a collection of installed packages:
一个Lockfile
描述了一组已安装的软件包:
type Lockfile struct {
Packages map[string]*spec.Package
}
It has a bunch of package-related methods:
它有一堆与包相关的方法:
Has
checks if a package is in the lockfile.Has
检查包是否在锁定文件中。Add
adds a package to the lockfile.添加
将一个包添加到锁定文件。Remove
removes a package from the lockfile.删除
从锁定文件中删除一个包。Range
iterates over packages from the lockfile.范围
从锁定文件中迭代包。Save
writes the lockfile to the specified directory.保存
将锁定文件写入指定目录。
Since the Lockfile
is always local, there is only one read function — ReadLocal
. The Path
function returns the path to the lockfile.
由于Lockfile
始终是本地的,因此只有一个读取函数 — ReadLocal
。 Path
函数返回到锁定文件的路径。
The lockfile
package depends on the spec
:
lockfile
包依赖于 spec
:
┌──────────┐ ┌──────────┐
│ lockfile │ → │ spec │
└──────────┘ └──────────┘
cmd
package
cmd
包
The cmd
package provides command steps — the basic building blocks for top-level commands like install
or update
.cmd
包提供命令步骤 — 顶层命令(如install
或update
)的基本构建块。
cmd
┌─────────────────────────────────────────────────────────────────────────────┐
│ assets spec lockfile version │
├─────────────────────────────────────────────────────────────────────────────┤
│ BuildAssetPath ReadSpec ReadLockfile ResolveVersion │
│ DownloadAsset FindSpec AddToLockfile HasNewVersion │
│ ValidateAsset ReadInstalledSpec RemoveFromLockfile │
│ UnpackAsset ReadChecksums │
│ InstallFiles │
│ DequarantineFiles │
└─────────────────────────────────────────────────────────────────────────────┘
Each step falls into a specific domain category, such as "assets" or "spec".
每个步骤都属于特定的领域类别,比如“资产”或“规格”。
Steps use the spec
, assets
and lockfile
packages we've discussed earlier. Let's look at the DownloadAsset
step for example (error handling omitted for brevity):
使用我们之前讨论过的spec
、assets
和lockfile
包。让我们以DownloadAsset
步骤为例(为简洁起见省略了错误处理):
// DownloadAsset downloads the package asset.
func DownloadAsset(pkg *spec.Package, assetPath *spec.AssetPath) *assets.Asset {
logx.Debug("downloading %s", assetPath)
dir := spec.Dir(os.TempDir(), pkg.Owner, pkg.Name)
fileio.CreateDir(dir)
var asset *assets.Asset
if assetPath.IsRemote {
asset = assets.Download(dir, assetPath.Value)
} else {
asset = assets.Copy(dir, assetPath.Value)
}
sizeKb := float64(asset.Size) / 1024
logx.Debug("downloaded %s (%.2f Kb)", asset.Name, sizeKb)
return asset
}
I think it's pretty obvious what's going on here: we create a temporary directory and then download (or copy) the asset file into it.
我认为这里正在发生的事情相当明显:我们创建一个临时目录,然后将资产文件下载(或复制)到其中。
The logx
and fileio
packages provide helper functions for logging and working with the file system. There are also httpx
for HTTP and github
for GitHub API calls.logx
和fileio
包提供了用于记录日志和处理文件系统的辅助函数。还有用于 HTTP 的httpx
和用于 GitHub API 调用的github
。
Let's look at another one — HasNewVersion
:
让我们再看一个 — HasNewVersion
:
// HasNewVersion checks if the remote package is newer than the installed one.
func HasNewVersion(remotePkg *spec.Package) bool {
installPath := spec.Path(WorkDir, remotePkg.Owner, remotePkg.Name)
if !fileio.Exists(installPath) {
return true
}
installedPkg := spec.ReadLocal(installPath)
logx.Debug("local package version = %s", installedPkg.Version)
if installedPkg.Version == "" {
// not explicitly versioned, always assume there is a later version
return true
}
if installedPkg.Version == remotePkg.Version {
return false
}
return semver.Compare(installedPkg.Version, remotePkg.Version) < 0
}
It's pretty simple, too: we load the locally installed spec file and compare its version with the version from the remote spec. The semver
helper package does the actual comparison.
这也很简单:我们加载本地安装的规范文件,并将其版本与远程规范文件的版本进行比较。 semver
助手包执行实际比较。
The cmd
package depends on all the packages we've already discussed:cmd
包依赖于我们已经讨论过的所有包:
┌────────────────────────────────────────────────┐
│ cmd │
└────────────────────────────────────────────────┘
↓ ↓ ↓ ↓
┌──────────┐ ┌──────┐ ┌────────┐ ┌───────────┐
│ lockfile │ → │ spec │ │ assets │ │ checksums │
└──────────┘ └──────┘ └────────┘ └───────────┘
Command packages 命令包
There is a top-level package for each package manager command:
每个包管理器命令都有一个顶级包
cmd/install
installs packages.cmd/install
安装软件包。cmd/update
updates installed packages.cmd/update
更新已安装的软件包。cmd/uninstall
removes installed packages.cmd/uninstall
删除已安装的软件包。cmd/list
shows installed packages.cmd/list
显示已安装的软件包。cmd/info
shows package information.cmd/info
显示软件包信息。- and so on. 等等。
Let's look at one of the most complex commands — update
(error handling omitted for brevity):
让我们来看看其中一个最复杂的命令 — update
(为简洁起见省略了错误处理):
func Update(args []string) {
fullName := args[0]
installedPkg := cmd.ReadLocal(fullName)
pkg := cmd.ReadSpec(installedPkg.Specfile)
cmd.ResolveVersion(pkg)
if !cmd.HasNewVersion(pkg) {
return
}
cmd.ReadChecksums(pkg)
assetUrl := cmd.BuildAssetPath(pkg)
asset := cmd.DownloadAsset(pkg, assetUrl)
cmd.ValidateAsset(pkg, asset)
cmd.UnpackAsset(pkg, asset)
cmd.InstallFiles(pkg, asset)
cmd.DequarantineFiles(pkg)
lck := cmd.ReadLockfile()
cmd.AddToLockfile(lck, pkg)
}
Thanks to the building blocks in the cmd
package, the update logic has become straightforward and self-explanatory. Just a linear sequence of steps with a single "does it have a new version?" branch.
感谢cmd
包中的构建块,更新逻辑变得简单明了。只需一系列线性步骤,带有一个“它有新版本吗?”分支。
Here is a complete package diagram (some arrows omitted to make it less noisy):
这是一个完整的包图(为了减少杂乱,有些箭头被省略了):
┌─────────┐ ┌────────┐ ┌───────────┐ ┌──────┐
│ install │ │ update │ │ uninstall │ │ list │ ...
└─────────┘ └────────┘ └───────────┘ └──────┘
↓ ↓ ↓ ↓
┌─────────────────────────────────────────────────┐
│ cmd │
└─────────────────────────────────────────────────┘
↓ ↓ ↓ ↓
┌──────────┐ ┌──────┐ ┌────────┐ ┌───────────┐
│ lockfile │ → │ spec │ │ assets │ │ checksums │
└──────────┘ └──────┘ └────────┘ └───────────┘
┌────────┐ ┌────────┐ ┌───────┐ ┌──────┐ ┌────────┐
│ fileio │ │ github │ │ httpx │ │ logx │ │ semver │
└────────┘ └────────┘ └───────┘ └──────┘ └────────┘
And that's it! 就是这样!
Summary 摘要
We've explored design choices for a simple general-purpose package manager:
我们已经探讨了一个简单通用的软件包管理器的设计选择:
- A package spec file describing a package.
一个描述软件包的软件包规范文件。 - A hierarchical owner/name folder structure for installed packages.
已安装软件包的分层所有者/名称文件夹结构。 - Project and global scope for installed packages.
已安装软件包的项目和全局范围。 - Spec file locator with fallback to the package registry.
规范文件定位器,可回退到软件包注册表。 - Versioning and latest versions.
版本控制和最新版本。 - The lockfile and single source of truth.
锁定文件和单一真相来源。 - Asset checksums. 资产校验和。
- Package dependencies (or lack thereof).
软件包依赖关系(或其缺乏)。
We've also explored implementation details in Go:
我们还在 Go 中探索了实现细节:
spec
package with data structures and functions related to the spec file.spec
包含与规范文件相关的数据结构和函数。assets
package for managing the package assets.资产
包用于管理包资产。checksums
package for loading asset checksums from a file.checksums
从文件加载资产校验和的包。lockfile
package for working with the lockfile.lockfile
用于处理锁文件的包。cmd
package with basic building blocks for top-level commands.cmd
包含用于顶层命令的基本构建块。- top-level packages for individual commands.
个别命令的顶级包。
Thanks for reading! I hope you'll find this article useful if you ever need to implement a package manager (or parts of it).
谢谢阅读!希望您在需要实现软件包管理器(或其部分)时会发现这篇文章有用。
★ Subscribe to keep up with new posts.
★ 订阅以保持更新。