这是用户在 2024-12-13 11:39 为 https://go.dev/doc/gc-guide 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

A Guide to the Go Garbage Collector
Go 垃圾收集器指南

Introduction 介绍 ¶

This guide is intended to aid advanced Go users in better understanding their application costs by providing insights into the Go garbage collector. It also provides guidance on how Go users may use these insights to improve their applications' resource utilization. It does not assume any knowledge of garbage collection, but does assume familiarity with the Go programming language.
本指南旨在帮助高级 Go 用户更好地理解他们的应用程序成本,通过提供对 Go 垃圾收集器的见解。它还提供了关于 Go 用户如何利用这些见解来改善其应用程序资源利用率的指导。它不假设对垃圾收集的任何知识,但假设对 Go 编程语言的熟悉。

The Go language takes responsibility for arranging the storage of Go values; in most cases, a Go developer need not care about where these values are stored, or why, if at all. In practice, however, these values often need to be stored in computer physical memory and physical memory is a finite resource. Because it is finite, memory must be managed carefully and recycled in order to avoid running out of it while executing a Go program. It's the job of a Go implementation to allocate and recycle memory as needed.
Go 语言负责安排 Go 值的存储;在大多数情况下,Go 开发者无需关心这些值存储在哪里,或者为什么存储。如果在实践中,这些值通常需要存储在计算机的物理内存中,而物理内存是有限的资源。由于它是有限的,内存必须小心管理并回收,以避免在执行 Go 程序时耗尽内存。Go 实现的工作是根据需要分配和回收内存。

Another term for automatically recycling memory is garbage collection. At a high level, a garbage collector (or GC, for short) is a system that recycles memory on behalf of the application by identifying which parts of memory are no longer needed. The Go standard toolchain provides a runtime library that ships with every application, and this runtime library includes a garbage collector.
自动回收内存的另一个术语是垃圾回收。在高层次上,垃圾收集器(简称 GC)是一个代表应用程序回收内存的系统,通过识别哪些内存部分不再需要。Go 标准工具链提供了一个与每个应用程序一起发布的运行时库,而这个运行时库包括一个垃圾收集器。

Note that the existence of a garbage collector as described by this guide is not guaranteed by the Go specification, only that the underlying storage for Go values is managed by the language itself. This omission is intentional and enables the use of radically different memory management techniques.
请注意,本指南所描述的垃圾收集器的存在并不由 Go 规范保证,仅保证 Go 值的底层存储由语言本身管理。此省略是故意的,允许使用截然不同的内存管理技术。

Therefore, this guide is about a specific implementation of the Go programming language and may not apply to other implementations. Specifically, this following guide applies to the standard toolchain (the gc Go compiler and tools). Gccgo and Gollvm both use a very similar GC implementation so many of the same concepts apply, but details may vary.
因此,本指南是关于 Go 编程语言的特定实现,可能不适用于其他实现。具体来说,以下指南适用于标准工具链( gc Go 编译器和工具)。Gccgo 和 Gollvm 都使用非常相似的 GC 实现,因此许多相同的概念适用,但细节可能有所不同。

Furthermore, this is a living document and will change over time to best reflect the latest release of Go. This document currently describes the garbage collector as of Go 1.19.
此外,这是一个动态文档,随着时间的推移将会变化,以最好地反映 Go 的最新版本。本文档目前描述的是 Go 1.19 的垃圾收集器。

Where Go Values Live
Go 值的存放位置 ¶

Before we dive into the GC, let's first discuss the memory that doesn't need to be managed by the GC.
在我们深入讨论垃圾回收器之前,先来谈谈不需要由垃圾回收器管理的内存。

For instance, non-pointer Go values stored in local variables will likely not be managed by the Go GC at all, and Go will instead arrange for memory to be allocated that's tied to the lexical scope in which it's created. In general, this is more efficient than relying on the GC, because the Go compiler is able to predetermine when that memory may be freed and emit machine instructions that clean up. Typically, we refer to allocating memory for Go values this way as "stack allocation," because the space is stored on the goroutine stack.
例如,存储在局部变量中的非指针 Go 值可能根本不会被 Go 垃圾回收器管理,Go 将安排分配与其创建的词法作用域相关的内存。一般来说,这比依赖垃圾回收器更高效,因为 Go 编译器能够预先确定何时可以释放该内存,并发出清理的机器指令。通常,我们将以这种方式为 Go 值分配内存称为“栈分配”,因为空间存储在 goroutine 栈上。

Go values whose memory cannot be allocated this way, because the Go compiler cannot determine its lifetime, are said to escape to the heap. "The heap" can be thought of as a catch-all for memory allocation, for when Go values need to be placed somewhere. The act of allocating memory on the heap is typically referred to as "dynamic memory allocation" because both the compiler and the runtime can make very few assumptions as to how this memory is used and when it can be cleaned up. That's where a GC comes in: it's a system that specifically identifies and cleans up dynamic memory allocations.
Go 值的内存无法以这种方式分配,因为 Go 编译器无法确定其生命周期,因此被称为逃逸到堆上。“堆”可以被视为内存分配的一个总称,用于当 Go 值需要放置在某个地方时。堆上分配内存的行为通常被称为“动态内存分配”,因为编译器和运行时对这块内存的使用方式和何时可以清理几乎没有任何假设。这就是 GC 的作用:它是一个专门识别和清理动态内存分配的系统。

There are many reasons why a Go value might need to escape to the heap. One reason could be that its size is dynamically determined. Consider for instance the backing array of a slice whose initial size is determined by a variable, rather than a constant. Note that escaping to the heap must also be transitive: if a reference to a Go value is written into another Go value that has already been determined to escape, that value must also escape.
有许多原因导致 Go 值可能需要逃逸到堆上。一个原因可能是它的大小是动态确定的。例如,考虑一个切片的后备数组,其初始大小是由一个变量而不是常量确定的。请注意,逃逸到堆上也必须是传递的:如果对一个 Go 值的引用被写入另一个已经确定要逃逸的 Go 值中,那么该值也必须逃逸。

Whether a Go value escapes or not is a function of the context in which it is used and the Go compiler's escape analysis algorithm. It would be fragile and difficult to try to enumerate precisely when values escape: the algorithm itself is fairly sophisticated and changes between Go releases. For more details on how to identify which values escape and which do not, see the section on eliminating heap allocations.
Go 值是否逃逸取决于其使用的上下文以及 Go 编译器的逃逸分析算法。精确列举值何时逃逸将是脆弱且困难的:该算法本身相当复杂,并且在 Go 版本之间会有所变化。有关如何识别哪些值逃逸以及哪些不逃逸的更多细节,请参见消除堆分配的部分。

Tracing Garbage Collection
跟踪垃圾收集 ¶

Garbage collection may refer to many different methods of automatically recycling memory; for example, reference counting. In the context of this document, garbage collection refers to tracing garbage collection, which identifies in-use, so-called live, objects by following pointers transitively.
垃圾收集可能指许多不同的自动回收内存的方法;例如,引用计数。在本文档的上下文中,垃圾收集指的是跟踪垃圾收集,它通过传递指针来识别正在使用的、所谓的活对象。

Let's define these terms more rigorously.
让我们更严格地定义这些术语。

Together, objects and pointers to other objects form the object graph. To identify live memory, the GC walks the object graph starting at the program's roots, pointers that identify objects that are definitely in-use by the program. Two examples of roots are local variables and global variables. The process of walking the object graph is referred to as scanning.
对象和指向其他对象的指针一起形成对象图。为了识别活动内存,垃圾回收器从程序的根开始遍历对象图,根是指向程序中确实在使用的对象的指针。根的两个例子是局部变量和全局变量。遍历对象图的过程称为扫描。

This basic algorithm is common to all tracing GCs. Where tracing GCs differ is what they do once they discover memory is live. Go's GC uses the mark-sweep technique, which means that in order to keep track of its progress, the GC also marks the values it encounters as live. Once tracing is complete, the GC then walks over all memory in the heap and makes all memory that is not marked available for allocation. This process is called sweeping.
这个基本算法是所有追踪垃圾回收器(GC)共有的。追踪垃圾回收器的不同之处在于它们在发现内存是活跃后所做的事情。Go 的垃圾回收器使用标记-清扫技术,这意味着为了跟踪其进度,垃圾回收器还会将遇到的值标记为活跃。一旦追踪完成,垃圾回收器就会遍历堆中的所有内存,并将所有未标记的内存标记为可分配。这个过程称为清扫。

One alternative technique you may be familiar with is to actually move the objects to a new part of memory and leave behind a forwarding pointer that is later used to update all the application's pointers. We call a GC that moves objects in this way a moving GC; Go has a non-moving GC.
一种你可能熟悉的替代技术是将对象实际移动到内存的新部分,并留下一个转发指针,稍后用于更新所有应用程序的指针。我们称这种移动对象的垃圾回收器为移动垃圾回收器;Go 有一个非移动垃圾回收器。

The GC cycle GC 循环 ¶

Because the Go GC is a mark-sweep GC, it broadly operates in two phases: the mark phase, and the sweep phase. While this statement might seem tautological, it contains an important insight: it's not possible to release memory back to be allocated until all memory has been traced, because there may still be an un-scanned pointer keeping an object alive. As a result, the act of sweeping must be entirely separated from the act of marking. Furthermore, the GC may also not be active at all, when there's no GC-related work to do. The GC continuously rotates through these three phases of sweeping, off, and marking in what's known as the GC cycle. For the purposes of this document, consider the GC cycle starting with sweeping, turning off, then marking.
因为 Go 的垃圾回收器是标记-清扫型垃圾回收器,它大致分为两个阶段:标记阶段和清扫阶段。虽然这句话看起来似乎是自我重复的,但它包含了一个重要的见解:在所有内存被追踪之前,无法将内存释放回去以供分配,因为可能仍然存在一个未扫描的指针使对象保持存活。因此,清扫的行为必须与标记的行为完全分开。此外,当没有与垃圾回收相关的工作时,垃圾回收器可能根本不处于活动状态。垃圾回收器在清扫、关闭和标记这三个阶段之间不断循环,这被称为垃圾回收周期。对于本文档的目的,考虑垃圾回收周期从清扫开始,然后关闭,再到标记。

The next few sections will focus on building intuition for the costs of the GC to aid users in tweaking GC parameters for their own benefit.
接下来的几个部分将重点关注建立对 GC 成本的直觉,以帮助用户调整 GC 参数以获得自身利益。

Understanding costs 理解成本 ¶

The GC is inherently a complex piece of software built on even more complex systems. It's easy to become mired in detail when trying to understand the GC and tweak its behavior. This section is intended to provide a framework for reasoning about the cost of the Go GC and tuning parameters.
GC 本质上是一个复杂的软件,建立在更复杂的系统之上。在试图理解 GC 并调整其行为时,很容易陷入细节之中。本节旨在提供一个框架,以便对 Go GC 的成本和调优参数进行推理。

To begin with, consider this model of GC cost based on three simple axioms.
首先,考虑基于三个简单公理的 GC 成本模型。

  1. The GC involves only two resources: CPU time, and physical memory.
    GC 仅涉及两个资源:CPU 时间和物理内存。

  2. The GC's memory costs consist of live heap memory, new heap memory allocated before the mark phase, and space for metadata that, even if proportional to the previous costs, are small in comparison.
    GC 的内存成本包括活动堆内存、标记阶段之前分配的新堆内存,以及元数据所需的空间,尽管与之前的成本成比例,但与之相比仍然很小。

    Note: live heap memory is memory that was determined to be live by the previous GC cycle, while new heap memory is any memory allocated in the current cycle, which may or may not be live by the end.
    注意:实时堆内存是指在上一个垃圾回收周期中被确定为活跃的内存,而新堆内存是指在当前周期中分配的任何内存,这些内存在周期结束时可能是活跃的,也可能不是。

  3. The GC's CPU costs are modeled as a fixed cost per cycle, and a marginal cost that scales proportionally with the size of the live heap.
    GC 的 CPU 成本被建模为每个周期的固定成本,以及与活动堆大小成比例的边际成本。

    Note: Asymptotically speaking, sweeping scales worse than marking and scanning, as it must perform work proportional to the size of the whole heap, including memory that is determined to be not live (i.e. "dead"). However, in the current implementation sweeping is so much faster than marking and scanning that its associated costs can be ignored in this discussion.
    注意:从渐近的角度来看,清扫的效率比标记和扫描差,因为它必须进行与整个堆的大小成比例的工作,包括被确定为不活跃(即“死”的)内存。然而,在当前的实现中,清扫的速度远快于标记和扫描,因此在本讨论中可以忽略其相关成本。

This model is simple but effective: it accurately categorizes the dominant costs of the GC. However, this model says nothing about the magnitude of these costs, nor how they interact. To model that, consider the following situation, referred to from here on as the steady-state.
这个模型简单但有效:它准确地对 GC 的主要成本进行分类。然而,这个模型并没有说明这些成本的大小,也没有说明它们如何相互作用。要对这一点建模,请考虑以下情况,从这里开始称之为稳态。

Note: the steady-state may seem contrived, but it's representative of the behavior of an application under some constant workload. Naturally, workloads can change even while an application is executing, but typically application behavior looks like a bunch of these steady-states strung together with some transient behavior in between.
注意:稳态可能看起来有些人为,但它代表了在某些恒定工作负载下应用程序的行为。自然,工作负载在应用程序执行时可能会变化,但通常应用程序的行为看起来像是一系列这样的稳态串联在一起,中间夹杂着一些瞬态行为。

Note: the steady-state makes no assumptions about the live heap. It may be growing with each subsequent GC cycle, it may shrink, or it may stay the same. However, trying to encompass all of these situations in the explanations to follow is tedious and not very illustrative, so the guide will focus on examples where the live heap remains constant. The GOGC section explores the non-constant live heap scenario in some more detail.
注意:稳态不对活动堆做任何假设。它可能在每个后续的垃圾回收周期中增长,可能缩小,或者可能保持不变。然而,试图在接下来的解释中涵盖所有这些情况是繁琐的,并且不太具有说明性,因此本指南将重点关注活动堆保持不变的示例。GOGC 部分将更详细地探讨非恒定活动堆的情况。

In the steady-state while the live heap size is constant, every GC cycle is going to look identical in the cost model as long as the GC executes after the same amount of time has passed. That's because in that fixed amount of time, with a fixed rate of allocation by the application, a fixed amount of new heap memory will be allocated. So with the live heap size constant, and that new heap memory constant, memory use is always going to be the same. And because the live heap is the same size, the marginal GC CPU costs will be the same, and the fixed costs will be incurred at some regular interval.
在稳态下,当活跃堆大小保持不变时,只要在相同的时间间隔后执行 GC,每个 GC 周期在成本模型中看起来都是相同的。这是因为在那段固定的时间内,应用程序以固定的速率分配内存,将分配固定量的新堆内存。因此,在活跃堆大小不变且新堆内存不变的情况下,内存使用量始终是相同的。而且由于活跃堆的大小相同,边际 GC CPU 成本也将相同,固定成本将在某个规律的时间间隔内产生。

Now consider if the GC were to shift the point at which it runs later in time. Then, more memory would be allocated but each GC cycle would still incur the same CPU cost. However over some other fixed window of time fewer GC cycles would finish, resulting in a lower overall CPU cost. The opposite would be true if the GC decided to start earlier in time: less memory would be allocated and CPU costs would be incurred more often.
现在考虑如果垃圾回收(GC)将运行的时间点推迟。那么,将分配更多的内存,但每个垃圾回收周期仍然会产生相同的 CPU 成本。然而,在某个固定的时间窗口内,完成的垃圾回收周期会减少,从而导致整体 CPU 成本降低。如果垃圾回收决定提前开始,情况则相反:分配的内存会减少,CPU 成本会更频繁地产生。

This situation represents the fundamental trade-off between CPU time and memory that a GC can make, controlled by how often the GC actually executes. In other words, the trade-off is entirely defined by GC frequency.
这种情况代表了 GC 可以做出的 CPU 时间和内存之间的基本权衡,这由 GC 实际执行的频率控制。换句话说,这种权衡完全由 GC 的频率定义。

One more detail remains to be defined, and that's when the GC should decide to start. Note that this directly sets the GC frequency in any particular steady-state, defining the trade-off. In Go, deciding when the GC should start is the main parameter which the user has control over.
还有一个细节需要定义,那就是垃圾回收(GC)应该何时开始。请注意,这直接设定了任何特定稳态下的 GC 频率,定义了权衡。在 Go 中,决定 GC 何时开始是用户可以控制的主要参数。

GOGC

At a high level, GOGC determines the trade-off between GC CPU and memory.
在高层次上,GOGC 决定了 GC CPU 和内存之间的权衡。

It works by determining the target heap size after each GC cycle, a target value for the total heap size in the next cycle. The GC's goal is to finish a collection cycle before the total heap size exceeds the target heap size. Total heap size is defined as the live heap size at the end of the previous cycle, plus any new heap memory allocated by the application since the previous cycle. Meanwhile, target heap memory is defined as:
它通过在每个垃圾回收周期后确定目标堆大小来工作,这是下一个周期的总堆大小的目标值。垃圾回收的目标是在总堆大小超过目标堆大小之前完成一个回收周期。总堆大小被定义为上一个周期结束时的活跃堆大小,加上自上一个周期以来应用程序分配的任何新堆内存。同时,目标堆内存被定义为:

Target heap memory = Live heap + (Live heap + GC roots) * GOGC / 100
目标堆内存 = 活跃堆 + (活跃堆 + GC 根) * GOGC / 100

As an example, consider a Go program with a live heap size of 8 MiB, 1 MiB of goroutine stacks, and 1 MiB of pointers in global variables. Then, with a GOGC value of 100, the amount of new memory that will be allocated before the next GC runs will be 10 MiB, or 100% of the 10 MiB of work, for a total heap footprint of 18 MiB. With a GOGC value of 50, then it'll be 50%, or 5 MiB. With a GOGC value of 200, it'll be 200%, or 20 MiB.
作为一个例子,考虑一个 Go 程序,其活动堆大小为 8 MiB,1 MiB 的 goroutine 栈和 1 MiB 的全局变量指针。那么,在 GOGC 值为 100 的情况下,在下一个 GC 运行之前将分配的新内存量将为 10 MiB,或 10 MiB 工作的 100%,总堆占用为 18 MiB。若 GOGC 值为 50,则为 50%,即 5 MiB。若 GOGC 值为 200,则为 200%,即 20 MiB。

Note: GOGC includes the root set only as of Go 1.18. Previously, it would only count the live heap. Often, the amount of memory in goroutine stacks is quite small and the live heap size dominates all other sources of GC work, but in cases where programs had hundreds of thousands of goroutines, the GC was making poor judgements.
注意:GOGC 从 Go 1.18 开始仅包括根集合。之前,它只计算活动堆。通常,goroutine 栈中的内存量相对较小,而活动堆大小主导了所有其他 GC 工作来源,但在程序有数十万个 goroutine 的情况下,GC 做出了不佳的判断。

The heap target controls GC frequency: the bigger the target, the longer the GC can wait to start another mark phase and vice versa. While the precise formula is useful for making estimates, it's best to think of GOGC in terms of its fundamental purpose: a parameter that picks a point in the GC CPU and memory trade-off. The key takeaway is that doubling GOGC will double heap memory overheads and roughly halve GC CPU cost, and vice versa. (To see a full explanation as to why, see the appendix.)
堆目标控制 GC 频率:目标越大,GC 可以等待更长时间再开始另一个标记阶段,反之亦然。虽然精确的公式对于进行估算很有用,但最好将 GOGC 视为其基本目的:一个在 GC CPU 和内存权衡中选择一个点的参数。关键要点是,双倍 GOGC 将使堆内存开销翻倍,并大致减半 GC CPU 成本,反之亦然。(要查看完整的解释,请参见附录。)

Note: the target heap size is just a target, and there are several reasons why the GC cycle might not finish right at that target. For one, a large enough heap allocation can simply exceed the target. However, other reasons appear in GC implementations that go beyond the GC model this guide has been using thus far. For some more detail, see the latency section, but the complete details may be found in the additional resources.
注意:目标堆大小只是一个目标,垃圾回收周期可能不会正好在该目标上完成,原因有很多。首先,足够大的堆分配可能会超过目标。然而,其他原因出现在超出本指南迄今为止使用的垃圾回收模型的垃圾回收实现中。有关更多细节,请参见延迟部分,但完整的细节可以在附加资源中找到。

GOGC may be configured through either the GOGC environment variable (which all Go programs recognize), or through the SetGCPercent API in the runtime/debug package.
GOGC 可以通过 GOGC 环境变量(所有 Go 程序都识别)或通过 SetGCPercent API 在 runtime/debug 包中进行配置。

Note that GOGC may also be used to turn off the GC entirely (provided the memory limit does not apply) by setting GOGC=off or calling SetGCPercent(-1). Conceptually, this setting is equivalent to setting GOGC to a value of infinity, as the amount of new memory before a GC is triggered is unbounded.
请注意,GOGC 也可以通过设置 GOGC=off 或调用 SetGCPercent(-1) 完全关闭 GC(前提是内存限制不适用)。从概念上讲,此设置相当于将 GOGC 设置为无穷大,因为在触发 GC 之前的新内存量是无限的。

To better understand everything we've discussed so far, try out the interactive visualization below that is built on the GC cost model discussed earlier. This visualization depicts the execution of some program whose non-GC work takes 10 seconds of CPU time to complete. In the first second it performs some initialization step (growing its live heap) before settling into a steady-state. The application allocates 200 MiB in total, with 20 MiB live at a time. It assumes that the only relevant GC work to complete comes from the live heap, and that (unrealistically) the application uses no additional memory.
为了更好地理解我们迄今为止讨论的内容,请尝试下面基于之前讨论的 GC 成本模型构建的交互式可视化。该可视化描绘了某个程序的执行,该程序的非 GC 工作需要 10 秒的 CPU 时间才能完成。在第一秒,它执行一些初始化步骤(增加其活动堆),然后进入稳态。该应用程序总共分配了 200 MiB 的内存,同时有 20 MiB 的内存处于活动状态。它假设唯一相关的 GC 工作来自活动堆,并且(不切实际地)该应用程序不使用额外的内存。

Use the slider to adjust the value of GOGC to see how the application responds in terms of total duration and GC overhead. Each GC cycle ends while the new heap drops to zero. The time taken while the new heap drops to zero is the combined time for the mark phase for cycle N, and the sweep phase for the cycle N+1. Note that this visualization (and all the visualizations in this guide) assume the application is paused while the GC executes, so GC CPU costs are fully represented by the time it takes for new heap memory to drop to zero. This is only to make visualization simpler; the same intuition still applies. The X axis shifts to always show the full CPU-time duration of the program. Notice that additional CPU time used by the GC increases the overall duration.
使用滑块调整 GOGC 的值,以查看应用程序在总持续时间和 GC 开销方面的响应。每个 GC 周期在新堆降至零时结束。新堆降至零所花费的时间是第 N 个周期的标记阶段和第 N+1 个周期的清扫阶段的总时间。请注意,此可视化(以及本指南中的所有可视化)假设在 GC 执行期间应用程序处于暂停状态,因此 GC 的 CPU 成本完全由新堆内存降至零所需的时间表示。这只是为了使可视化更简单;相同的直觉仍然适用。X 轴始终显示程序的完整 CPU 时间持续时间。请注意,GC 使用的额外 CPU 时间增加了整体持续时间。

0.0 s1.0 s2.0 s3.0 s4.0 s5.0 s6.0 s7.0 s8.0 s9.0 s10.0 s0 MiB10 MiB20 MiB30 MiB40 MiB50 MiBLive HeapNew HeapTotal: 10.39 sGC CPU = 3.8%, Peak Mem = 55.6 MiB(Peak Live Mem = 20.0 MiB)
GOGC
178

Notice that the GC always incurs some CPU and peak memory overhead. As GOGC increases, CPU overhead decreases, but peak memory increases proportionally to the live heap size. As GOGC decreases, the peak memory requirement decreases at the expense of additional CPU overhead.
请注意,GC 始终会产生一些 CPU 和峰值内存开销。随着 GOGC 的增加,CPU 开销减少,但峰值内存与活动堆大小成比例增加。随着 GOGC 的减少,峰值内存需求减少,但会增加额外的 CPU 开销。

Note: the graph displays CPU time, not wall-clock time to complete the program. If the program runs on 1 CPU and fully utilizes its resources, then these are equivalent. A real-world program likely runs on a multi-core system and does not 100% utilize the CPUs at all times. In these cases the wall-time impact of the GC will be lower.
注意:图表显示的是 CPU 时间,而不是完成程序的实际时间。如果程序在 1 个 CPU 上运行并充分利用其资源,那么这两者是等价的。一个真实的程序可能在多核系统上运行,并且并不总是 100% 利用 CPU。在这些情况下,GC 的墙时间影响将会更低。

Note: the Go GC has a minimum total heap size of 4 MiB, so if the GOGC-set target is ever below that, it gets rounded up. The visualization reflects this detail.
注意:Go GC 的最小堆大小为 4 MiB,因此如果 GOGC 设置的目标低于此值,它会被向上舍入。可视化反映了这一细节。

Here's another example that's a little bit more dynamic and realistic. Once again, the application takes 10 CPU-seconds to complete without the GC, but the steady-state allocation rate increases dramatically half-way through, and the live heap size shifts around a bit in the first phase. This example demonstrates how the steady-state might look when the live heap size is actually changing, and how a higher allocation rate leads to more frequent GC cycles.
这是另一个稍微动态和现实一点的例子。再次强调,该应用在没有垃圾回收的情况下需要 10 个 CPU 秒才能完成,但在中途,稳定状态的分配速率急剧增加,实时堆大小在第一阶段略有变化。这个例子展示了当实时堆大小实际上发生变化时,稳定状态可能是什么样子,以及更高的分配速率如何导致更频繁的垃圾回收周期。

0.0 s1.0 s2.0 s3.0 s4.0 s5.0 s6.0 s7.0 s8.0 s9.0 s10.0 s11.0 s12.0 s13.0 s0 MiB10 MiB20 MiB30 MiB40 MiBLive HeapNew HeapTotal: 13.39 sGC CPU = 25.3%, Peak Mem = 40.0 MiB(Peak Live Mem = 20.0 MiB)
GOGC
100

Memory limit 内存限制 ¶

Until Go 1.19, GOGC was the sole parameter that could be used to modify the GC's behavior. While it works great as a way to set a trade-off, it doesn't take into account that available memory is finite. Consider what happens when there's a transient spike in the live heap size: because the GC will pick a total heap size proportional to that live heap size, GOGC must be configured such for the peak live heap size, even if in the usual case a higher GOGC value provides a better trade-off.
直到 Go 1.19,GOGC 是唯一可以用来修改 GC 行为的参数。虽然它作为设置权衡的方式效果很好,但并没有考虑到可用内存是有限的。考虑一下当实时堆大小出现瞬时峰值时会发生什么:因为 GC 会选择与该实时堆大小成比例的总堆大小,所以必须为峰值实时堆大小配置 GOGC,即使在通常情况下,更高的 GOGC 值提供了更好的权衡。

The visualization below demonstrates this transient heap spike situation.
下面的可视化展示了这个瞬态堆峰值情况。

0.0 s1.0 s2.0 s3.0 s4.0 s5.0 s6.0 s7.0 s8.0 s9.0 s10.0 s0 MiB10 MiB20 MiB30 MiB40 MiB50 MiB60 MiBLive HeapNew HeapTotal: 10.67 sGC CPU = 6.3%, Peak Mem = 60.0 MiB(Peak Live Mem = 30.0 MiB)
GOGC
100

If the example workload is running in a container with a bit over 60 MiB of memory available, then GOGC can't be increased beyond 100, even though the rest of the GC cycles have the available memory to make use of that extra memory. Furthermore, in some applications, these transient peaks can be rare and hard to predict, leading to occasional, unavoidable, and potentially costly out-of-memory conditions.
如果示例工作负载在可用内存略超过 60 MiB 的容器中运行,则 GOGC 无法超过 100,尽管其余的 GC 周期有可用内存来利用额外的内存。此外,在某些应用程序中,这些瞬时峰值可能很少且难以预测,导致偶尔出现不可避免且可能代价高昂的内存不足情况。

That's why in the 1.19 release, Go added support for setting a runtime memory limit. The memory limit may be configured either via the GOMEMLIMIT environment variable which all Go programs recognize, or through the SetMemoryLimit function available in the runtime/debug package.
这就是为什么在 1.19 版本中,Go 添加了设置运行时内存限制的支持。内存限制可以通过所有 Go 程序识别的 GOMEMLIMIT 环境变量进行配置,也可以通过 runtime/debug 包中可用的 SetMemoryLimit 函数进行配置。

This memory limit sets a maximum on the total amount of memory that the Go runtime can use. The specific set of memory included is defined in terms of runtime.MemStats as the expression
此内存限制设置了 Go 运行时可以使用的内存总量的最大值。所包含的特定内存集是根据 runtime.MemStats 定义的表达式。

Sys - HeapReleased

or equivalently in terms of the runtime/metrics package,
或等效地用 runtime/metrics 包表示,

/memory/classes/total:bytes - /memory/classes/heap/released:bytes

Because the Go GC has explicit control over how much heap memory it uses, it sets the total heap size based on this memory limit and how much other memory the Go runtime uses.
因为 Go 的垃圾回收器对其使用的堆内存量有明确的控制,因此它根据这个内存限制和 Go 运行时使用的其他内存来设置总堆大小。

The visualization below depicts the same single-phase steady-state workload from the GOGC section, but this time with an extra 10 MiB of overhead from the Go runtime and with an adjustable memory limit. Try shifting around both GOGC and the memory limit and see what happens.
下面的可视化展示了来自 GOGC 部分的相同单相稳态工作负载,但这次增加了来自 Go 运行时的 10 MiB 开销,并且具有可调的内存限制。尝试调整 GOGC 和内存限制,看看会发生什么。

0.0 s1.0 s2.0 s3.0 s4.0 s5.0 s6.0 s7.0 s8.0 s9.0 s10.0 s0 MiB10 MiB20 MiB30 MiB40 MiB50 MiBOther Mem.Live HeapNew HeapTotal: 10.68 sGC CPU = 6.4%, Peak Mem = 50.0 MiB(Peak Live Mem = 20.0 MiB, Other Mem = 10.0 MiB)
GOGC
100
Memory Limit  内存限制
100.0 MiB

Notice that when the memory limit is lowered below the peak memory that's determined by GOGC (42 MiB for a GOGC of 100), the GC runs more frequently to keep the peak memory within the limit.
请注意,当内存限制降低到低于由 GOGC 确定的峰值内存(对于 GOGC 为 100,峰值内存为 42 MiB)时,GC 会更频繁地运行,以保持峰值内存在限制之内。

Returning to our previous example of the transient heap spike, by setting a memory limit and turning up GOGC, we can get the best of both worlds: no memory limit breach, and better resource economy. Try out the interactive visualization below.
回到我们之前关于瞬态堆峰值的例子,通过设置内存限制并提高 GOGC,我们可以兼得两全其美:没有内存限制违规,并且更好的资源经济。请尝试下面的互动可视化。

0.0 s1.0 s2.0 s3.0 s4.0 s5.0 s6.0 s7.0 s8.0 s9.0 s10.0 s0 MiB10 MiB20 MiB30 MiB40 MiB50 MiB60 MiBLive HeapNew HeapTotal: 10.67 sGC CPU = 6.3%, Peak Mem = 60.0 MiB(Peak Live Mem = 30.0 MiB)
GOGC
100
Memory Limit  内存限制
100.0 MiB

Notice that with some values of GOGC and the memory limit, peak memory use stops at whatever the memory limit is, but that the rest of the program's execution still obeys the total heap size rule set by GOGC.
请注意,对于某些 GOGC 值和内存限制,峰值内存使用停止在内存限制所设定的值,但程序的其余执行仍然遵循 GOGC 设定的总堆大小规则。

This observation leads to another interesting detail: even when GOGC is set to off, the memory limit is still respected! In fact, this particular configuration represents a maximization of resource economy because it sets the minimum GC frequency required to maintain some memory limit. In this case, all of the program's execution has the heap size rise to meet the memory limit.
这个观察引出了另一个有趣的细节:即使将 GOGC 设置为关闭,内存限制仍然会被遵守!事实上,这种特定配置代表了资源经济的最大化,因为它设置了维持某个内存限制所需的最小 GC 频率。在这种情况下,程序的所有执行都使堆大小上升以满足内存限制。

Now, while the memory limit is clearly a powerful tool, the use of a memory limit does not come without a cost, and certainly doesn't invalidate the utility of GOGC.
现在,虽然内存限制显然是一个强大的工具,但使用内存限制并不是没有代价的,当然也并不否定 GOGC 的实用性。

Consider what happens when the live heap grows large enough to bring total memory use close to the memory limit. In the steady-state visualization above, try turning GOGC off and then slowly lowering the memory limit further and further to see what happens. Notice that the total time the application takes will start to grow in an unbounded manner as the GC is constantly executing to maintain an impossible memory limit.
考虑当实时堆增长到足够大以使总内存使用接近内存限制时会发生什么。在上面的稳态可视化中,尝试关闭 GOGC,然后逐渐降低内存限制,看看会发生什么。请注意,应用程序所需的总时间将开始以无限的方式增长,因为垃圾回收器不断执行以维持一个不可能的内存限制。

This situation, where the program fails to make reasonable progress due to constant GC cycles, is called thrashing. It's particularly dangerous because it effectively stalls the program. Even worse, it can happen for exactly the same situation we were trying to avoid with GOGC: a large enough transient heap spike can cause a program to stall indefinitely! Try reducing the memory limit (around 30 MiB or lower) in the transient heap spike visualization and notice how the worst behavior specifically starts with the heap spike.
这种情况称为抖动,程序由于不断的垃圾回收周期而无法取得合理进展。它特别危险,因为它实际上使程序停滞不前。更糟糕的是,它可能发生在我们试图通过 GOGC 避免的完全相同的情况下:足够大的瞬态堆峰值可能导致程序无限期停滞!尝试在瞬态堆峰值可视化中减少内存限制(大约 30 MiB 或更低),并注意到最糟糕的行为特别是从堆峰值开始。

In many cases, an indefinite stall is worse than an out-of-memory condition, which tends to result in a much faster failure.
在许多情况下,无限期的停滞比内存不足的情况更糟,后者往往导致更快的失败。

For this reason, the memory limit is defined to be soft. The Go runtime makes no guarantees that it will maintain this memory limit under all circumstances; it only promises some reasonable amount of effort. This relaxation of the memory limit is critical to avoiding thrashing behavior, because it gives the GC a way out: let memory use surpass the limit to avoid spending too much time in the GC.
因此,内存限制被定义为软限制。Go 运行时并不保证在所有情况下都会维持这个内存限制;它只承诺会付出一些合理的努力。这种对内存限制的放宽对于避免抖动行为至关重要,因为它给了垃圾回收器一个出路:让内存使用超过限制,以避免在垃圾回收中花费过多时间。

How this works internally is the GC sets an upper limit on the amount of CPU time it can use over some time window (with some hysteresis for very short transient spikes in CPU use). This limit is currently set at roughly 50%, with a 2 * GOMAXPROCS CPU-second window. The consequence of limiting GC CPU time is that the GC's work is delayed, meanwhile the Go program may continue allocating new heap memory, even beyond the memory limit.
内部工作原理是垃圾回收(GC)设置了一个在某个时间窗口内可以使用的 CPU 时间的上限(对于非常短暂的 CPU 使用峰值有一些滞后)。这个上限目前大约设定为 50%,窗口为 2 * GOMAXPROCS CPU 秒。限制 GC CPU 时间的结果是 GC 的工作被延迟,同时 Go 程序可能会继续分配新的堆内存,甚至超出内存限制。

The intuition behind the 50% GC CPU limit is based on the worst-case impact on a program with ample available memory. In the case of a misconfiguration of the memory limit, where it is set too low mistakenly, the program will slow down at most by 2x, because the GC can't take more than 50% of its CPU time away.
50% GC CPU 限制背后的直觉是基于在可用内存充足的情况下对程序的最坏影响。在内存限制配置错误的情况下,如果设置得过低,程序最多会减慢 2 倍,因为 GC 不能占用超过 50%的 CPU 时间。

Note: the visualizations on this page do not simulate the GC CPU limit.
注意:此页面上的可视化不模拟 GC CPU 限制。

Suggested uses 建议用途 ¶

While the memory limit is a powerful tool, and the Go runtime takes steps to mitigate the worst behaviors from misuse, it's still important to use it thoughtfully. Below is a collection of tidbits of advice about where the memory limit is most useful and applicable, and where it might cause more harm than good.
虽然内存限制是一个强大的工具,并且 Go 运行时采取措施来减轻误用带来的最坏行为,但仍然重要的是要谨慎使用它。以下是一些关于内存限制最有用和适用的地方,以及它可能造成的更多危害而非好处的建议。

Latency 延迟 ¶

The visualizations in this document have modeled the application as paused while the GC is executing. GC implementations do exist that behave this way, and they're referred to as "stop-the-world" GCs.
本文件中的可视化将应用程序建模为在垃圾回收(GC)执行时处于暂停状态。确实存在这种行为的 GC 实现,它们被称为“停止世界”GC。

The Go GC, however, is not fully stop-the-world and does most of its work concurrently with the application. This is primarily to reduce application latencies. Specifically, the end-to-end duration of a single unit of computation (e.g. a web request). Thus far, this document mainly considered application throughput (e.g. web requests handled per second). Note that each example in the GC cycle section focused on the total CPU duration of an executing program. However, such a duration is far less meaningful for say, a web service. While throughput is still important for a web service (i.e. queries per second), often the latency of each individual request matters even more.
然而,Go GC 并不是完全的全停顿,并且大部分工作与应用程序并发进行。这主要是为了减少应用程序的延迟。具体来说,是单个计算单位的端到端持续时间(例如,一个网页请求)。到目前为止,本文档主要考虑了应用程序的吞吐量(例如,每秒处理的网页请求)。请注意,GC 周期部分中的每个示例都集中在正在执行的程序的总 CPU 持续时间上。然而,对于一个网页服务来说,这样的持续时间要少得多的意义。虽然吞吐量对于网页服务仍然很重要(即每秒查询数),但每个单独请求的延迟往往更为重要。

In terms of latency, a stop-the-world GC may require a considerable length of time to execute both its mark and sweep phases, during which the application, and in the context of a web service, any in-flight request, is unable to make further progress. Instead, the Go GC avoids making the length of any global application pauses proportional to the size of the heap, and that the core tracing algorithm is performed while the application is actively executing. (The pauses are more strongly proportional to GOMAXPROCS algorithmically, but most commonly are dominated by the time it takes to stop running goroutines.) Collecting concurrently is not without cost: in practice it often leads to a design with lower throughput than an equivalent stop-the-world garbage collector. However, it's important to note that lower latency does not inherently mean lower throughput, and the performance of the Go garbage collector has steadily improved over time, in both latency and throughput.
在延迟方面,停止世界的垃圾回收可能需要相当长的时间来执行其标记和清扫阶段,在此期间,应用程序,以及在网络服务的上下文中,任何正在进行的请求,都无法进一步推进。相反,Go 的垃圾回收避免了将任何全局应用程序暂停的长度与堆的大小成比例,并且核心跟踪算法是在应用程序积极执行时进行的。(暂停在算法上更强烈地与 GOMAXPROCS 成比例,但通常主要受停止运行 goroutine 所需时间的支配。)并发收集并非没有成本:在实践中,它通常导致设计的吞吐量低于等效的停止世界垃圾回收器。然而,重要的是要注意,较低的延迟并不固有地意味着较低的吞吐量,并且 Go 垃圾回收器的性能随着时间的推移在延迟和吞吐量方面都稳步提高。

The concurrent nature of Go's current GC does not invalidate anything discussed in this document so far: none of the statements relied on this design choice. GC frequency is still the primary way the GC trades off between CPU time and memory for throughput, and in fact, it also takes on this role for latency. This is because most of the costs for the GC are incurred while the mark phase is active.
Go 当前的 GC 的并发特性并没有使本文档中讨论的内容失效:没有任何陈述依赖于这一设计选择。GC 的频率仍然是 GC 在吞吐量方面在 CPU 时间和内存之间权衡的主要方式,实际上,它在延迟方面也承担了这一角色。这是因为 GC 的大部分成本是在标记阶段活动时产生的。

The key takeaway then, is that reducing GC frequency may also lead to latency improvements. This applies not only to reductions in GC frequency from modifying tuning parameters, like increasing GOGC and/or the memory limit, but also applies to the optimizations described in the optimization guide.
关键的收获是,减少垃圾回收(GC)频率也可能导致延迟改善。这不仅适用于通过修改调优参数(如增加 GOGC 和/或内存限制)来减少 GC 频率,还适用于优化指南中描述的优化。

However, latency is often more complex to understand than throughput, because it is a product of the moment-to-moment execution of the program and not just an aggregation of costs. As a result, the connection between latency and GC frequency is less direct. Below is a list of possible sources of latency for those inclined to dig deeper.
然而,延迟往往比吞吐量更复杂,因为它是程序瞬时执行的产物,而不仅仅是成本的汇总。因此,延迟与垃圾回收频率之间的关系不那么直接。以下是可能导致延迟的来源列表,供有兴趣深入研究的人参考。

  1. Brief stop-the-world pauses when the GC transitions between the mark and sweep phases,
    在垃圾回收(GC)在标记和清扫阶段之间过渡时,短暂的停止世界暂停,
  2. Scheduling delays because the GC takes 25% of CPU resources when in the mark phase,
    调度延迟,因为垃圾回收在标记阶段占用 25%的 CPU 资源,
  3. User goroutines assisting the GC in response to a high allocation rate,
    用户协程在高分配率下协助垃圾回收
  4. Pointer writes requiring additional work while the GC is in the mark phase, and
    指针写入需要额外工作的内容,而垃圾回收处于标记阶段,和
  5. Running goroutines must be suspended for their roots to be scanned.
    必须暂停运行的 goroutine,以便扫描它们的根。

These latency sources are visible in execution traces, except for pointer writes requiring additional work.
这些延迟源在执行跟踪中是可见的,指针写入需要额外的工作。

Additional resources 额外资源 ¶

While the information presented above is accurate, it lacks the detail to fully understand costs and trade-offs in the Go GC's design. For more information, see the following additional resources.
虽然上述信息是准确的,但缺乏足够的细节来全面理解 Go GC 设计中的成本和权衡。有关更多信息,请参阅以下附加资源。

A note about virtual memory
关于虚拟内存的说明 ¶

This guide has largely focused on the physical memory use of the GC, but a question that comes up regularly is what exactly that means and how it compares to virtual memory (typically presented in programs like top as "VSS").
本指南主要关注 GC 的物理内存使用,但一个经常出现的问题是这到底意味着什么,以及它与虚拟内存(通常在程序中以 top 表示为“VSS”)的比较。

Physical memory is memory housed in the actual physical RAM chip in most computers. Virtual memory is an abstraction over physical memory provided by the operating system to isolate programs from one another. It's also typically acceptable for programs to reserve virtual address space that doesn't map to any physical addresses at all.
物理内存是大多数计算机中实际物理 RAM 芯片中的内存。虚拟内存是操作系统提供的一种对物理内存的抽象,用于将程序相互隔离。程序通常也可以保留不映射到任何物理地址的虚拟地址空间。

Because virtual memory is just a mapping maintained by the operating system, it is typically very cheap to make large virtual memory reservations that don't map to physical memory.
因为虚拟内存只是由操作系统维护的映射,因此通常很便宜可以进行不映射到物理内存的大量虚拟内存预留。

The Go runtime generally relies upon this view of the cost of virtual memory in a few ways:
Go 运行时通常在几个方面依赖于虚拟内存成本的这种视图:

As a result, virtual memory metrics such as "VSS" in top are typically not very useful in understanding a Go program's memory footprint. Instead, focus on "RSS" and similar measurements, which more directly reflect physical memory usage.
因此, top 中的“VSS”等虚拟内存指标通常在理解 Go 程序的内存占用方面并不是很有用。相反,应该关注“RSS”和类似的测量,这些更直接反映了物理内存的使用情况。

Optimization guide 优化指南 ¶

Identifying costs 识别成本 ¶

Before trying to optimize how your Go application interacts with the GC, it's important to first identify that the GC is a major cost in the first place.
在尝试优化您的 Go 应用程序与 GC 的交互之前,首先要确认 GC 本身就是一个主要的成本。

The Go ecosystem provides a number of tools for identifying costs and optimizing Go applications. For a brief overview of these tools, see the guide on diagnostics. Here, we'll focus on a subset of these tools and a reasonable order to apply them in in order to understand GC impact and behavior.
Go 生态系统提供了一些工具,用于识别成本和优化 Go 应用程序。有关这些工具的简要概述,请参阅诊断指南。在这里,我们将重点关注这些工具的一个子集,以及合理的应用顺序,以便理解 GC 的影响和行为。

  1. CPU profiles CPU 配置文件

    A good place to start is with CPU profiling. CPU profiling provides an overview of where CPU time is spent, though to the untrained eye it may be difficult to identify the magnitude of the role the GC plays in a particular application. Luckily, understanding how the GC fits in mostly boils down to knowing what different functions in the `runtime` package mean. Below is a useful subset of these functions for interpreting CPU profiles.
    一个好的起点是进行 CPU 分析。CPU 分析提供了 CPU 时间花费的概述,尽管对于未经训练的眼睛来说,可能很难识别 GC 在特定应用程序中所扮演角色的大小。幸运的是,理解 GC 的作用主要归结为了解`runtime`包中不同函数的含义。下面是一个有用的函数子集,用于解释 CPU 分析。

    Note: the functions listed below are not leaf functions, so they may not show up in the default the pprof tool provides with the top command. Instead, use the top -cum command or use the list command on these functions directly and focus on the cumulative percent column.
    注意:下面列出的函数不是叶函数,因此它们可能不会出现在 top 命令提供的默认 pprof 工具中。相反,请使用 top -cum 命令或直接在这些函数上使用 list 命令,并关注累积百分比列。

    • runtime.gcBgMarkWorker: Entrypoint to the background mark worker goroutines. Time spent here scales with GC frequency and the complexity and size of the object graph. It represents a baseline for how much time the application spends marking and scanning.
      runtime.gcBgMarkWorker : 背景标记工作协程的入口点。这里花费的时间与垃圾回收频率以及对象图的复杂性和大小成比例。它代表了应用程序在标记和扫描上花费的时间基线。

      Note: Within these goroutines, you will find calls to runtime.gcDrainMarkWorkerDedicated, runtime.gcDrainMarkWorkerFractional, and runtime.gcDrainMarkWorkerIdle, which indicate worker type. In a largely idle Go application, the Go GC is going to use up additional (idle) CPU resources to get its job done faster, which is indicated with the runtime.gcDrainMarkWorkerIdle symbol. As a result, time here may represent a large fraction of CPU samples, which the Go GC believes are free. If the application becomes more active, CPU time in idle workers will drop. One common reason this can happen is if an application runs entirely in one goroutine but GOMAXPROCS is >1.
      注意:在这些 goroutine 中,您会发现对 runtime.gcDrainMarkWorkerDedicatedruntime.gcDrainMarkWorkerFractionalruntime.gcDrainMarkWorkerIdle 的调用,这些调用表示工作类型。在一个大部分空闲的 Go 应用程序中,Go GC 将使用额外的(空闲)CPU 资源来更快地完成其工作,这通过 runtime.gcDrainMarkWorkerIdle 符号表示。因此,这里的时间可能代表 CPU 样本的很大一部分,Go GC 认为这些是空闲的。如果应用程序变得更加活跃,空闲工作线程中的 CPU 时间将下降。发生这种情况的一个常见原因是如果应用程序完全在一个 goroutine 中运行,但 GOMAXPROCS >1。

    • runtime.mallocgc: Entrypoint to the memory allocator for heap memory. A large amount of cumulative time spent here (>15%) typically indicates a lot of memory being allocated.
      runtime.mallocgc : 堆内存分配器的入口点。这里花费的大量累计时间(>15%)通常表明分配了大量内存。

    • runtime.gcAssistAlloc: Function goroutines enter to yield some of their time to assist the GC with scanning and marking. A large amount of cumulative time spent here (>5%) indicates that the application is likely out-pacing the GC with respect to how fast it's allocating. It indicates a particularly high degree of impact from the GC, and also represents time the application spend marking and scanning. Note that this is included in the runtime.mallocgc call tree, so it will inflate that as well.
      runtime.gcAssistAlloc : 函数 goroutines 进入以让出一些时间来协助垃圾回收(GC)进行扫描和标记。这里花费的大量累积时间(>5%)表明应用程序在分配速度上可能超过了垃圾回收。它表明垃圾回收的影响特别大,并且还代表了应用程序花费在标记和扫描上的时间。请注意,这包含在 runtime.mallocgc 调用树中,因此也会使其膨胀。

  2. Execution traces 执行跟踪

    While CPU profiles are great for identifying where time is spent in aggregate, they're less useful for indicating performance costs that are more subtle, rare, or related to latency specifically. Execution traces on the other hand provide a rich and deep view into a short window of a Go program's execution. They contain a variety of events related to the Go GC and specific execution paths can be directly observed, along with how the application might interact with the Go GC. All the GC events tracked are conveniently labeled as such in the trace viewer.
    虽然 CPU 配置文件非常适合识别时间的总体分配,但它们在指示更微妙、罕见或特定与延迟相关的性能成本方面的作用较小。另一方面,执行跟踪提供了对 Go 程序执行的短时间窗口的丰富而深入的视图。它们包含与 Go 垃圾回收(GC)相关的各种事件,并且可以直接观察到特定的执行路径,以及应用程序如何与 Go 垃圾回收(GC)进行交互。所有跟踪的垃圾回收事件在跟踪查看器中都方便地标记为此。

    See the documentation for the runtime/trace package for how to get started with execution traces.
    请参阅 runtime/trace 包的文档以了解如何开始使用执行跟踪。

  3. GC traces GC 跟踪

    When all else fails, the Go GC provides a few different specific traces that provide much deeper insights into GC behavior. These traces are always printed directly to STDERR, one line per GC cycle, and are configured through the GODEBUG environment variable that all Go programs recognize. They're mostly useful for debugging the Go GC itself since they require some familiarity with the specifics of the GC's implementation, but nonetheless can occasionally be useful to gain a better understanding of GC behavior.
    当其他方法都失败时,Go GC 提供了一些不同的特定跟踪,这些跟踪可以深入了解 GC 的行为。这些跟踪总是直接打印到 STDERR,每个 GC 周期一行,并通过所有 Go 程序都识别的 GODEBUG 环境变量进行配置。它们主要用于调试 Go GC 本身,因为它们需要对 GC 实现的细节有一定的了解,但有时仍然可以帮助更好地理解 GC 的行为。

    The core GC trace is enabled by setting GODEBUG=gctrace=1. The output produced by this trace is documented in the environment variables section in the documentation for the runtime package.
    通过设置 GODEBUG=gctrace=1 启用核心 GC 跟踪。此跟踪生成的输出在 runtime 包的文档中的环境变量部分进行了记录。

    A supplementary GC trace called the "pacer trace" provides even deeper insights and is enabled by setting GODEBUG=gcpacertrace=1. Interpreting this output requires an understanding of the GC's "pacer" (see additional resources), which is outside the scope of this guide.
    一个名为“节奏追踪”的补充 GC 追踪提供了更深入的见解,通过设置 GODEBUG=gcpacertrace=1 来启用。解释此输出需要理解 GC 的“节奏器”(请参阅附加资源),这超出了本指南的范围。

Eliminating heap allocations
消除堆分配 ¶

One way to reduce costs from the GC is to have the GC manage fewer values to begin with. The techniques described below can produce some of the largest improvements in performance, because as the GOGC section demonstrated, the allocation rate of a Go program is a major factor in GC frequency, the key cost metric used by this guide.
减少 GC 成本的一种方法是让 GC 一开始管理更少的值。下面描述的技术可以带来性能上的一些重大改善,因为正如 GOGC 部分所示,Go 程序的分配速率是 GC 频率的一个主要因素,这是本指南使用的关键成本指标。

Heap profiling 堆分析 ¶

After identifying that the GC is a source of significant costs, the next step in eliminating heap allocations is to find out where most of them are coming from. For this purpose, memory profiles (really, heap memory profiles) are very useful. Check out the documentation for how to get started with them.
在确定垃圾回收器是重大成本来源后,消除堆分配的下一步是找出大部分分配来自哪里。为此,内存分析(实际上是堆内存分析)非常有用。查看文档以了解如何开始使用它们。

Memory profiles describe where in the program heap allocations come from, identifying them by the stack trace at the point they were allocated. Each memory profile can break down memory in four ways.
内存配置文件描述了程序堆分配的来源,通过分配时的堆栈跟踪来识别它们。每个内存配置文件可以通过四种方式细分内存。

Switching between these different views of heap memory may be done with either the -sample_index flag to the pprof tool, or via the sample_index option when the tool is used interactively.
在这些不同的堆内存视图之间切换可以通过将 -sample_index 标志传递给 pprof 工具,或者在交互式使用该工具时通过 sample_index 选项来完成。

Note: memory profiles by default only sample a subset of heap objects so they will not contain information about every single heap allocation. However, this is sufficient to find hot-spots. To change the sampling rate, see runtime.MemProfileRate.
注意:内存配置文件默认仅对堆对象的子集进行采样,因此它们不会包含关于每个堆分配的所有信息。然而,这足以找到热点。要更改采样率,请参见 runtime.MemProfileRate

For the purposes of reducing GC costs, alloc_space is typically the most useful view as it directly corresponds to the allocation rate. This view will indicate allocation hot spots that would provide the most benefit.
为了降低 GC 成本, alloc_space 通常是最有用的视图,因为它直接对应于分配率。该视图将指示分配热点,这将带来最大的好处。

Escape analysis 逃逸分析 ¶

Once candidate heap allocation sites have been identified with the help of heap profiles, how can they be eliminated? The key is to leverage the Go compiler's escape analysis to have the Go compiler find alternative, and more efficient storage for this memory, for example in the goroutine stack. Luckily, the Go compiler has the ability to describe why it decides to escape a Go value to the heap. With that knowledge, it becomes a matter of reorganizing your source code to change the outcome of the analysis (which is often the hardest part, but outside the scope of this guide).
一旦通过堆分析识别出候选的堆分配位置,如何消除它们?关键在于利用 Go 编译器的逃逸分析,让 Go 编译器找到替代的、更高效的内存存储,例如在 goroutine 栈中。幸运的是,Go 编译器能够描述它为何决定将 Go 值逃逸到堆中。掌握这些知识后,重新组织源代码以改变分析结果就成为了一个问题(这通常是最困难的部分,但超出了本指南的范围)。

As for how to access the information from the Go compiler's escape analysis, the simplest way is through a debug flag supported by the Go compiler that describes all optimizations it applied or did not apply to some package in a text format. This includes whether or not values escape. Try the following command, where [package] is some Go package path.
至于如何访问 Go 编译器的逃逸分析信息,最简单的方法是通过 Go 编译器支持的调试标志,以文本格式描述它对某个包应用或未应用的所有优化。这包括值是否逃逸。尝试以下命令,其中 [package] 是某个 Go 包路径。

$ go build -gcflags=-m=3 [package]

This information can also be visualized as an overlay in VS Code. This overlay is configured and enabled in the VS Code Go plugin settings.
此信息也可以在 VS Code 中可视化为叠加层。此叠加层在 VS Code Go 插件设置中配置和启用。

  1. Set the ui.codelenses setting to include gc_details .
    ui.codelenses 设置为包括 gc_details
  2. Enable the overlay for escape analysis by setting ui.diagnostic.annotations to include escape .
    通过设置 ui.diagnostic.annotations 以包含 escape 来启用逃逸分析的覆盖。

Finally, the Go compiler provides this information in a machine-readable (JSON) format that may be used to build additional custom tooling. For more information on that, see the documentation in the source Go code.
最后,Go 编译器以机器可读的(JSON)格式提供这些信息,可以用来构建额外的自定义工具。有关更多信息,请参阅源 Go 代码中的文档。

Implementation-specific optimizations
实现特定的优化 ¶

The Go GC is sensitive to the demographics of live memory, because a complex graph of objects and pointers both limits parallelism and generates more work for the GC. As a result, the GC contains a few optimizations for specific common structures. The most directly useful ones for performance optimization are listed below.
Go GC 对实时内存的人口统计非常敏感,因为复杂的对象和指针图既限制了并行性,又为 GC 产生了更多的工作。因此,GC 针对一些特定的常见结构进行了少量优化。以下列出了对性能优化最直接有用的优化。

Note: Applying the optimizations below may reduce the readability of your code by obscuring intent, and may fail to hold up across Go releases. Prefer to apply these optimizations only in the places they matter most. Such places may be identified by using the tools listed in the section on identifying costs.
注意:应用以下优化可能会降低代码的可读性,因为它会模糊意图,并可能在 Go 版本更新中失效。最好仅在最重要的地方应用这些优化。这些地方可以通过使用识别成本部分中列出的工具来识别。

Furthermore, the GC must interact with nearly every pointer it sees, so using indices into an slice, for example, instead of pointers, can aid in reducing GC costs.
此外,垃圾回收器必须与它看到的几乎每个指针进行交互,因此,例如使用切片中的索引而不是指针,可以帮助减少垃圾回收的成本。

Linux transparent huge pages (THP)
Linux 透明大页 (THP) ¶

When a program accesses memory, the CPU needs to translate the virtual memory addresses it uses into physical memory addresses that refer to the data it was trying to access. To do this, the CPU consults the "page table," a data structure that represents the mapping from virtual to physical memory, managed by the operating system. Each entry in the page table represents an indivisible block of physical memory called a page, hence the name.
当程序访问内存时,CPU 需要将其使用的虚拟内存地址转换为指向它试图访问的数据的物理内存地址。为此,CPU 会查询“页表”,这是一个表示虚拟内存到物理内存映射的数据结构,由操作系统管理。页表中的每个条目代表一个不可分割的物理内存块,称为页,因此得名。

Transparent huge pages (THP) is a Linux feature that transparently replaces pages of physical memory backing contiguous virtual memory regions with bigger blocks of memory called huge pages. By using bigger blocks, fewer page table entries are needed to represent the same memory region, improving page table lookup times. However, bigger blocks mean more waste if only a small part of the huge page is used by the system.
透明大页(THP)是一个 Linux 特性,它透明地用称为大页的更大内存块替换支持连续虚拟内存区域的物理内存页面。通过使用更大的内存块,表示相同内存区域所需的页表项更少,从而提高了页表查找时间。然而,更大的内存块意味着如果系统仅使用大页的一小部分,则会造成更多的浪费。

When running Go programs in production, enabling transparent huge pages on Linux can improve throughput and latency at the cost of additional memory use. Applications with small heaps tend not to benefit from THP and may end up using a substantial amount of additional memory (as high as 50%). However, applications with big heaps (1 GiB or more) tend to benefit quite a bit (up to 10% throughput) without very much additional memory overhead (1-2% or less). Being aware of your THP settings in either case can be helpful, and experimentation is always recommended.
在生产环境中运行 Go 程序时,在 Linux 上启用透明大页可以提高吞吐量和延迟,但会增加内存使用。小堆的应用程序通常不会从 THP 中受益,可能会使用大量额外内存(高达 50%)。然而,大堆(1 GiB 或更多)的应用程序往往会受益颇多(吞吐量提高最多 10%),而额外的内存开销相对较小(1-2% 或更少)。无论哪种情况,了解您的 THP 设置都是有帮助的,并且始终建议进行实验。

One can enable or disable transparent huge pages in a Linux environment by modifying /sys/kernel/mm/transparent_hugepage/enabled. See the official Linux admin guide for more details. If you choose to have your Linux production environment enable transparent huge pages, we recommend the following additional settings for Go programs.
在 Linux 环境中,可以通过修改 /sys/kernel/mm/transparent_hugepage/enabled 来启用或禁用透明大页。有关更多详细信息,请参阅官方 Linux 管理员指南。如果您选择在 Linux 生产环境中启用透明大页,我们建议对 Go 程序进行以下附加设置。

Appendix 附录 ¶

Additional notes on GOGC
关于 GOGC 的附加说明 ¶

The GOGC section claimed that doubling GOGC doubles heap memory overheads and halves GC CPU costs. To see why, let's break it down mathematically.
GOGC 部分声称,双倍的 GOGC 会使堆内存开销翻倍,并将 GC CPU 成本减半。为了理解原因,让我们从数学上分析一下。

Firstly, the heap target sets a target for the total heap size. This target, however, mainly influences the new heap memory, because the live heap is fundamental to the application.
首先,堆目标设定了总堆大小的目标。然而,这个目标主要影响新的堆内存,因为活跃堆对应用程序至关重要。

Target heap memory = Live heap + (Live heap + GC roots) * GOGC / 100
目标堆内存 = 活跃堆 + (活跃堆 + GC 根) * GOGC / 100

Total heap memory = Live heap + New heap memory
总堆内存 = 活跃堆 + 新堆内存

New heap memory = (Live heap + GC roots) * GOGC / 100
新堆内存 = (活动堆 + GC 根) * GOGC / 100

From this we can see that doubling GOGC would also double the amount of new heap memory that application will allocate each cycle, which captures heap memory overheads. Note that Live heap + GC roots is an approximation of the amount of memory the GC needs to scan.
由此我们可以看出,双倍 GOGC 也会使应用程序在每个周期分配的新堆内存量翻倍,这会捕获堆内存开销。请注意,活动堆 + GC 根是 GC 需要扫描的内存量的近似值。

Next, let's look at GC CPU cost. Total cost can be broken down as the cost per cycle, times GC frequency over some time period T.
接下来,我们来看看 GC CPU 成本。总成本可以分解为每个周期的成本,乘以在某段时间 T 内的 GC 频率。

Total GC CPU cost = (GC CPU cost per cycle) * (GC frequency) * T
总 GC CPU 成本 = (每周期 GC CPU 成本) * (GC 频率) * T

GC CPU cost per cycle can be derived from the GC model:
GC CPU 每个周期的成本可以从 GC 模型中得出:

GC CPU cost per cycle = (Live heap + GC roots) * (Cost per byte) + Fixed cost
GC CPU 每个周期的成本 = (活动堆 + GC 根) * (每字节成本) + 固定成本

Note that sweep phase costs are ignored here as mark and scan costs dominate.
请注意,这里忽略了清扫阶段的成本,因为标记和扫描的成本占主导地位。

The steady-state is defined by a constant allocation rate and a constant cost per byte, so in the steady-state we can derive a GC frequency from this new heap memory:
稳态由恒定的分配速率和每字节恒定的成本定义,因此在稳态中,我们可以从这个新的堆内存中推导出 GC 频率:

GC frequency = (Allocation rate) / (New heap memory) = (Allocation rate) / ((Live heap + GC roots) * GOGC / 100)
GC 频率 = (分配速率) / (新堆内存) = (分配速率) / ((存活堆 + GC 根) * GOGC / 100)

Putting this together, we get the full equation for the total cost:
将这些结合在一起,我们得到总成本的完整方程:

Total GC CPU cost = (Allocation rate) / ((Live heap + GC roots) * GOGC / 100) * ((Live heap + GC roots) * (Cost per byte) + Fixed cost) * T
总 GC CPU 成本 = (分配率) / ((存活堆 + GC 根) * GOGC / 100) * ((存活堆 + GC 根) * (每字节成本) + 固定成本) * T

For a sufficiently large heap (which represents most cases), the marginal costs of a GC cycle dominate the fixed costs. This allows for a significant simplification of the total GC CPU cost formula.
对于足够大的堆(这代表了大多数情况),GC 周期的边际成本主导了固定成本。这使得总 GC CPU 成本公式得到了显著简化。

Total GC CPU cost = (Allocation rate) / (GOGC / 100) * (Cost per byte) * T
总 GC CPU 成本 = (分配率) / (GOGC / 100) * (每字节成本) * T

From this simplified formula, we can see that if we double GOGC, we halve total GC CPU cost. (Note that the visualizations in this guide do simulate fixed costs, so the GC CPU overheads reported by them will not exactly halve when GOGC doubles.) Furthermore, GC CPU costs are largely determined by allocation rate and the cost per byte to scan memory. For more information on how to reduce these costs specifically, see the optimization guide.
从这个简化的公式中,我们可以看到,如果我们将 GOGC 加倍,总的 GC CPU 成本就会减半。(请注意,本指南中的可视化确实模拟了固定成本,因此当 GOGC 加倍时,它们报告的 GC CPU 开销不会完全减半。)此外,GC CPU 成本在很大程度上取决于分配速率和扫描内存的每字节成本。有关如何具体降低这些成本的更多信息,请参阅优化指南。

Note: there exists a discrepancy between the size of the live heap, and the amount of that memory the GC actually needs to scan: the same size live heap but with a different structure will result in a different CPU cost, but the same memory cost, resulting a different trade-off. This is why the structure of the heap is part of the definition of the steady-state. The heap target should arguably only include the scannable live heap as a closer approximation of memory the GC needs to scan, but this leads to degenerate behavior when there's a very small amount of scannable live heap but the live heap is otherwise large.
注意:实时堆的大小与垃圾回收器实际需要扫描的内存量之间存在差异:相同大小的实时堆但具有不同的结构将导致不同的 CPU 成本,但相同的内存成本,从而导致不同的权衡。这就是为什么堆的结构是稳态定义的一部分。堆目标可以说只应包括可扫描的实时堆,以更接近垃圾回收器需要扫描的内存,但当可扫描的实时堆非常小而实时堆又很大时,这会导致退化行为。