hoakley April 25, 2022 Macs, Technology
霍克利 2022 年 4 月 25 日 Macs, Technology

How macOS manages M1 CPU cores
macOS 如何管理 M1 CPU 核心

In conventional multi-core processors, like the Intel CPUs used in previous Mac models, all cores are the same. Allocating threads to cores is therefore a matter of balancing their load, in what’s termed symmetric multiprocessing (SMP).
在传统的多核处理器中，如过去 Mac 型号中使用的 Intel CPU，所有核心都是相同的。因此，将线程分配到核心的问题在于平衡它们的负载，这被称为对称多处理（SMP）。

In Activity Monitor’s CPU History window, core load (as CPU %) is shown against time, with the oldest values at the left. Odd-numbered cores in the left half are real, and show the eight cores in the 8-Core Intel Xeon W under heavy load. Even-numbered cores in the right half are the virtual cores of Hyper-threading, engaged to cope with the heaviest load.
在活动监视器的 CPU 历史窗口中，显示了随时间变化的核心负载（以 CPU%表示），最旧的值位于左侧。左侧半部分的奇数核心是真实的，显示了在重度负载下运行的 8 核 Intel Xeon W 处理器的八个核心。右侧半部分的偶数核心是超线程的虚拟核心，用于应对最重的负载。

CPUs in Apple Silicon chips are different, as they contain two different core types, one designed for high performance (Performance, P or Firestorm cores), the other for energy efficiency (Efficiency, E or Icestorm cores). For these to work well, threads need to be allocated by core type, a task which can be left to apps and processes, as it is in Asahi Linux, or managed by the operating system, as it is in macOS. This article explains how macOS manages core allocation in all Apple’s M1 series chips, in what it terms asymmetric multiprocessing (AMP, although others prefer to call this heterogeneous computing).
苹果硅芯片中的 CPU 有所不同，因为它们包含两种不同的核心类型，一种为高性能设计（性能型、P 或 Firestorm 核心），另一种为能效设计（效率型、E 或 Icestorm 核心）。为了使这些核心协同工作，线程需要按照核心类型进行分配。这一任务可以由应用程序和进程自行处理，就像在 Asahi Linux 中那样，也可以由操作系统管理，就像在 macOS 中那样。本文将解释 macOS 如何在所有苹果 M1 系列芯片中管理核心分配，它称之为异步多处理（AMP，尽管其他人更倾向于称之为异构计算）。

Architecture 建筑

There are two types of CPU core in M1 series chips:
M1 系列芯片中有两种类型的 CPU 内核：

E cores contain roughly half the internal processing units of P cores, and have a maximum frequency of 2064 MHz.
E 类核心大约包含 P 类核心内部处理单元的一半，最大频率为 2064 MHz。
P cores have a higher maximum frequency, of either 3204 MHz in the original M1, or 3228 MHz in M1 Pro/Max/Ultra.
P 内核的最高频率更高，原始 M1 的频率为 3204 MHz，而 M1 Pro/Max/Ultra 的频率为 3228 MHz。

There are three configurations of CPU cores available in M1 series chips:
M1 系列芯片有三种 CPU 核心配置：

the original M1, with 4 E and 4 P cores, in the MacBook Air, MacBook Pro 13-inch, iMac and Mac mini;
原始 M1，带有 4 个 E 和 4 个 P 核心，在 MacBook Air，13 英寸 MacBook Pro，iMac 和 Mac mini 中；
M1 Pro and Max, with 2 E and 8 P cores, in the MacBook Pro 14- and 16-inch, and Mac Studio Max;
M1 Pro 和 M1 Max，具有 2 个 E 核和 8 个 P 核，用于 MacBook Pro 14 英寸和 16 英寸，以及 Mac Studio Max
M1 Ultra, with 4 E and 16 P cores, in the Mac Studio Ultra.
M1 Ultra，带有 4 个 E 和 16 个 P 核心，在 Mac Studio Ultra 中。

Some MacBook Pro 14-inch notebooks have a reduced M1 Pro chip with only 6 P cores instead of 8.
一些 14 英寸的 MacBook Pro 笔记本配备了缩减版的 M1 Pro 芯片，只有 6 个 P 核心，而不是 8 个。

To simplify the management of cores, macOS divides them functionally into clusters of 2-4 cores of the same type. Unfortunately, numbering of cores at a system level, as shown by tools such as powermetrics, and as displayed in Activity Monitor is different. For consistency with the latter, I here follow its core numbering, but number clusters in accordance with the system. The three chips have the following functional clusters as of macOS Monterey 12.3.1:
为了简化核心的管理，macOS 将它们功能上分为每 2-4 个相同类型的内核的集群。不幸的是，系统级别对内核的编号，如通过 powermetrics 等工具显示的那样，以及在活动监视器中显示的编号不同。为了与后者保持一致，我在这里遵循其内核编号方式，但按照系统的规则编号集群。截至 macOS Monterey 12.3.1，三个芯片具有以下功能集群：

the original M1 has one cluster of each type of core, E0 and P0, each containing 4 cores of the same type;
原始 M1 包含每种类型的两个核心集群，E0 和 P0，每个集群包含 4 个相同类型的核心
M1 Pro and Max have one cluster of 2 E cores (E0), and two clusters each containing 4 P cores (P0, P1);
M1 Pro 和 M1 Max 均包含一个由 2 个 E 核心组成的集群（E0），以及各自包含 4 个 P 核心的两个集群（P0, P1）
M1 Ultra has one cluster of 4 E cores (E0), and four clusters each containing 4 P cores (P0, P1, P2, P3).
M1 Ultra 拥有一个包含 4 个 E 内核的集群（E0），以及四个各自包含 4 个 P 内核的集群（P0、P1、P2、P3）。

All cores within any given cluster are run at the same frequency, and generally (but not always) have their load balanced within the cluster. There are occasions when load is distributed more unevenly, and in exceptional cases, certain threads may be allocated to only one core within a cluster.
任何给定集群内的所有核心运行在相同的频率下，通常（但并非总是如此）在集群内部实现负载均衡。有时负载分布会更加不均匀，而在极少数情况下，某些线程可能仅被分配到集群内的一颗核心。

Thread control 线程控制

Unlike Asahi Linux, macOS doesn’t provide direct access to cores, core types, or clusters, at least not in public APIs. Instead, these are normally managed through Grand Central Dispatch using Quality of Service (QoS) settings, which macOS then uses to determine thread management policies.
与 Asahi Linux 不同，macOS 并未直接提供对核心、核心类型或集群的访问，至少在公开 API 中是如此。相反，这些通常通过 Grand Central Dispatch 使用服务质量（QoS）设置进行管理，macOS 然后使用这些设置来确定线程管理策略。

Threads with the lowest QoS will only be run on the E cluster, while those with higher QoS can be assigned to either E or P clusters. The latter behaviour can be modified dynamically by the taskpolicy command tool, or by the setpriority() function in code. Those can constrain higher QoS threads to execution only on E cores, or on either E or P cores. However, they cannot alter the rule that lowest QoS threads are only executed on the E cluster.
服务质量最低的线程仅在 E 集群上运行，而具有较高服务质量的线程可以分配到 E 或 P 集群。后一种行为可以通过 taskpolicy 命令工具或代码中的 setpriority() 函数动态修改。它们可以限制具有较高服务质量的线程仅在 E 内核上执行，或者在 E 或 P 内核上执行。然而，它们不能改变服务质量最低的线程仅在 E 集群上执行的规则。

macOS itself adopts a strategy where most, if not all, of its background tasks are run at lowest QoS. These include automatic Time Machine backups and Spotlight index maintenance. This also applies to compression and decompression performed by Archive Utility: for example, if you download a copy of Xcode in xip format, decompressing that takes a long time as much of the code is constrained to the E cores, and there’s no way to change that.
macOS 自身采用了一种策略，其中大多数，如果不是全部，其后台任务都在最低服务质量（QoS）下运行。这包括自动 Time Machine 备份和 Spotlight 索引维护。此策略同样适用于存档工具执行的压缩和解压缩操作：例如，如果你下载了一个 Xcode 的 xip 格式副本，解压缩这个文件需要很长时间，因为其中的大部分代码都被限制在 E 核心上运行，而且无法改变这一点。

Background threads 背景线程

Lowest QoS threads are loaded and run differently in original M1 and M1 Pro/Max chips, as they have different E cluster sizes.
在原始的 M1 和 M1 Pro/Max 芯片中，最低 QoS 线程的加载和运行方式不同，因为它们的 E 集群大小不同。

In the original M1 chip, with 4 E cores, QoS 9 threads are run with the core frequency set at about 1000 MHz (1 GHz). What happens in the M1 Pro/Max with its 2 E cores is different: if there’s only one thread, it’s run on the cluster at a frequency of about 1000 MHz, but if there are two or more threads, the frequency is increased to 2064 MHz. This ensures that the E cluster in the M1 Pro/Max delivers at least the performance for background tasks as that in the original M1, at similar power consumption, despite the difference in size of the clusters.
在原始的 M1 芯片中，拥有 4 个 E 内核，QoS 9 线程在大约 1000 MHz（1 GHz）的核心频率下运行。而在 M1 Pro/Max 中的 2 个 E 内核则有所不同：如果只有一个线程，它会在大约 1000 MHz 的频率下运行在集群上，但如果有多于两个线程，频率会增加到 2064 MHz。这确保了 M1 Pro/Max 中的 E 集群在相似的功耗下提供至少与原始 M1 背景任务相同的性能，尽管集群的大小不同。

Common exceptions to this are lowest QoS threads of processes such as backupd, which also undergo I/O throttling, and are run at a frequency of about 1000 MHz on the M1 Pro/Max.
对此的常见例外是最低 QoS 线程，如 backupd 这样的进程，它们也会经历 I/O 限速，并且在 M1 Pro/Max 上以大约 1000 MHz 的频率运行。

User threads 用户线程

All threads with a QoS higher than 9 are handled similarly, with differences resulting from the priority given to their queues.
所有服务质量高于 9 的线程都以相同的方式处理，差异源自对其队列的优先级分配。

As high QoS threads are eligible to be run on either of the core types and any core cluster, their management differs between M1 and M1 Pro/Max variants. On the original M1, with its single P cluster, batches of up to 8 threads can be distributed to the two available clusters, with 4 thread slots available on each. When there are 4 or fewer threads, they will be run on the P cluster whenever possible, and the E cluster is only recruited when there are more high QoS threads in the queue. P cores are run at a frequency of about 3 GHz, and E cores at about 2 GHz, twice the frequency normally used for QoS 9 threads.
在原始 M1 上，由于其仅有一个 P 集群，因此最多可以将 8 个线程分发到两个可用集群，每个集群有 4 个线程槽位。当线程数量为 4 或更少时，它们将在可能的情况下在 P 集群上运行，只有在队列中有更多高 QoS 线程时才会招募 E 集群。P 内核的运行频率约为 3 GHz，而 E 内核约为 2 GHz，这是通常用于 QoS 9 线程频率的两倍。

M1 Pro and Max chips have a total of three clusters, two of 4 P cores each, plus the half-size 2-core E cluster. With up to 4 threads in the queue, they will be allocated to the first P cluster (P0); threads 5-8 will go to the second P cluster (P1), which would otherwise remain unloaded and inactive for economy. If there are a further 2 threads in the queue, they will be run on the E cores. Frequencies set are the maximum for the core type, to 3228 MHz on P0 and P1, and 2064 MHz on E0.
M1 Pro 和 M1 Max 芯片总共有三个集群，其中两个各有 4 个 P 内核，再加上一个半大小的 2 核 E 集群。当有最多 4 个线程在队列中时，它们将被分配到第一个 P 集群（P0）；线程 5-8 将被发送到第二个 P 集群（P1），否则该集群将保持空闲和不活动以节省能源。如果有额外的 2 个线程在队列中，它们将在 E 内核上运行。设置的频率是每种核心类型的最大值，P0 和 P1 的频率设置为 3228 MHz，E0 的频率设置为 2064 MHz。

M1 Ultra chips have a total of five clusters, each with 4 cores. They follow the same policy as M1 Pro/Max chips, but with all 4 P clusters being loaded before E0 is used.
M1 Ultra 芯片总共有五个集群，每个集群包含 4 个核心。它们遵循与 M1 Pro/Max 芯片相同的规定，但在使用 E0 之前，所有 4 个 P 集群都会被加载。

There are two situations in which code appears to run exclusively on a single core, though: during the boot process, before the kernel initialises and runs the other cores, code runs on just a single active E core. The other situation is when ‘preparing’ a downloaded macOS update before starting the installation process. On M1 Pro/Max chips, the 5 threads are given one core-worth of active residency, indicated as 100% CPU, but are confined to a single P core, the first in the first of the 2 P clusters (P0, labelled below as Core 3).
在以下两种情况下，代码似乎仅在单个内核上运行：在启动过程中，即在内核初始化并运行其他内核之前，代码仅在单个活动的 E 核心上运行。另一种情况是在开始安装过程之前，准备下载的 macOS 更新。在 M1 Pro/Max 芯片上，5 个线程被分配到一个核心的活动驻留，表示为 100% CPU，但仅限于单个 P 核心，即第一个在两个 P 集群中的第一个 P 集群（P0，下文标记为内核 3）中的第一个核心。

This unusual distribution of active residency is sustained throughout the 30 minutes of preparation to install the update.
这个不寻常的活跃驻留分布在整个 30 分钟的准备过程中保持不变，用于安装更新。

Patterns under load 负载下的模式

The effects of macOS policies are shown in the following more typical examples taken from the CPU History window of Activity Monitor.
macOS 策略的效果在活动监视器的 CPU 历史窗口中通过以下更具代表性的例子来展示。

This original M1 chip is here being subjected to a series of loads from increasing numbers of CPU-intensive threads. Its 2 clusters, E0 and P0, are distinguished by the blue boxes. With 1-4 threads at high QoS (from the left), the load is borne entirely in the P0 cluster, then with 5-8 threads the E0 cluster takes its share.
这款原始 M1 芯片正在接受从 CPU 密集型线程数量增加的一系列负载。它的 2 个集群，E0 和 P0，由蓝色框区分。在 1-4 个线程且 QoS 较高（从左侧）时，负载完全由 P0 集群承担，然后在 5-8 个线程时，E0 集群承担其份额。

This M1 Pro chip is under heavy and changing load from many threads, some of which are at background QoS, while others are at higher QoS. While much of the load is borne by the 2 cores in the E0 cluster, P0 is also loaded for much of the time, and P1 is recruited to take some of the peak.
这款 M1 Pro 芯片正承受着来自多个线程的重载和变化，其中一些线程的 QoS 处于后台，而其他线程的 QoS 则更高。大部分负载由 E0 集群中的 2 个核心承担，P0 在大部分时间里也处于负载状态，而 P1 则被招募来承担一些峰值负载。

I have rearranged the cores shown in this example from an M1 Ultra to separate them into their clusters, with E0 at the top, and P0 to P3 in two columns below. Loads shown here are typical of those during the first few minutes after login, with heavy load on E0 and P0, which spills over to P1-3 during the early peak.
我已经将此示例中显示的内核从 M1 Ultra 重新排列，将其分隔成各自的集群，E0 位于顶部，P0 到 P3 在下面两列中。这里显示的负载是在登录后的前几分钟典型负载，E0 和 P0 的负载较重，在早期高峰期间，负载会溢出到 P1-3。

One important piece of information about M1 cores not (yet) provided by Activity Monitor is cluster frequency. A cluster running at 100% CPU (equivalent to active residency) with a frequency of less than 1000 MHz is completing instructions at less than half the rate of the same cluster at 100% CPU and a frequency of 2064 MHz. Unfortunately, the only accessible means of obtaining frequency information at present is the command tool powermetrics.
关于 M1 内核的重要信息，活动监视器尚未提供的是一组集群的频率。一个运行在 100% CPU（相当于活跃驻留）且频率低于 1000 MHz 的集群，其执行指令的速度是相同集群在 100% CPU 和 2064 MHz 频率下速度的一半。目前获取频率信息的唯一可访问方式是命令工具 powermetrics 。

A summary of macOS management of CPU cores in the original M1, M1 Pro and Max chips is given in the diagram below. As I complete information about the M1 Ultra, I will incorporate that in the next revision. If you have an M1 Ultra, are familiar with powermetrics, and would like to help, I’d be delighted to work with you.
以下是图中关于原始 M1、M1 Pro 和 Max 芯片管理的 CPU 内核的概述。随着关于 M1 Ultra 的更多信息的收集，我将在下一次修订中加入这部分内容。如果你拥有 M1 Ultra，熟悉 powermetrics ，并且愿意提供帮助，我非常乐意与你合作。

With Apple expected to announce the successor to its M1 series at the next WWDC in early June, it will be interesting to see its core architecture and the strategies offered by macOS for managing it.
预计苹果将在 6 月初的 WWDC 上宣布其 M1 系列的继任者，届时将非常有趣地看到其核心架构以及 macOS 管理该架构的策略。

I’m very grateful to Walt for providing information about and the screenshot of the Ultra under load.
我对沃特提供关于超频负载下信息以及截图表示非常感激。

25Comments

Add yours

1

Walt on April 25, 2022 at 8:16 am

Howard, another nicely written explainer article.

LikeLiked by 1 person
- 2
  
  hoakley on April 25, 2022 at 12:25 pm
  
  Thank you.
  I reiterate my gratitude to you – I’ll explain more by email, but I would be delighted if we can continue to collaborate with the frequency measurements in a few days when I have more time.
  Howard.
  
  LikeLike
3

Graham Lee on April 25, 2022 at 10:17 am

The XNU source code has [good documentation on the scheduler](https://github.com/apple-oss-distributions/xnu/blob/xnu-8019.80.24/doc/sched_clutch_edge.md) and the rationale for its design.

LikeLiked by 1 person
- 4
  
  hoakley on April 25, 2022 at 12:29 pm
  
  Thank you – that makes fascinating reading, if a little abstract in this context.
  Howard.
  
  LikeLike
5

Paolo on April 25, 2022 at 11:00 am

It seems strange to me that in an M1 Ultra all 4 efficiency cores are in the same cluster since they are physically separated

LikeLiked by 1 person
- 6
  
  hoakley on April 25, 2022 at 12:31 pm
  
  Thank you.
  I wasn’t sure whether Apple would opt for one or two clusters. As this is a matter of how macOS allocates threads and controls frequency, I don’t think that there’s any problem with the cluster spanning both chiplets in this way. However, as there was no previous way to manage two E clusters, that would have posed new problems, so the single cluster makes sense.
  Howard.
  
  LikeLike
7

Daniel on April 25, 2022 at 11:25 am

One interesting thing I noticed is that code running on P-cores that yields to the scheduler (e.g. using `pthread_yield_np()` or `std::this_thread::yield()`) is demoted to the E-cores. This can result in unexpected performance, for example when measuring the performance of OpenMP barriers (there, setting KMP_USE_YIELD=0 for Intel OpenMP run-time used by LLVM helps).

LikeLiked by 1 person
- 8
  
  hoakley on April 25, 2022 at 12:32 pm
  
  Thank you. That seems to be a bug, I would have thought. Does that thread have an explicit QoS set for it?
  Howard.
  
  LikeLike
9

Liam on April 25, 2022 at 1:10 pm

Interesting looking at the CPU History of my i9 under Catalina. It shows very clearly that it runs work biased to the full cores, and minimizes putting work on the hyperthreads. It also looks like it prefers COre and over other cores. I can send you a shapshot of what I am seeing if you would like.

LikeLiked by 1 person
- 10
  
  hoakley on April 25, 2022 at 2:01 pm
  
  Thank you. Hyper-threading will only be engaged when the ‘real’ cores are heavily loaded, as shown. Scheduling normally aims to produce roughly equal load, but sometimes it does seem to favour certain cores. As they’re all the same, it doesn’t make any difference, unlike in an M1 with E and P cores.
  Howard.
  
  LikeLike
11

Liam on April 25, 2022 at 1:11 pm

Apologies – should have said “prefers Core 1 over other cores”

LikeLiked by 1 person
12

hstriepe on April 25, 2022 at 6:05 pm

Thank you for your continued efforts. You produce more in retirement than many full time authors.
Watching cores on my M1 Ultra is interesting. Unless the load gets going with Xcode or Logic, core 8 through 10 show little load and the second set of 8 Performance cores shows NO load. As you point out this is very different from Intel Mac behavior.

LikeLiked by 1 person
- 13
  
  hoakley on April 25, 2022 at 10:13 pm
  
  Thank you.
  Yes, I think it’s really wonderful to see a whole cluster just idling at 600 MHz, consuming almost no power, and generating no heat. This is sensible computing.
  Howard.
  
  LikeLike
14

Warren Nagourney on April 25, 2022 at 7:29 pm

Thank you for an interesting article on core usage. Do you know whether Apple will support symmetric multiprocessing under the control of the program? For example, in a ray tracing program one can use the Neon engine to obtain a 4-fold speed up with single precision floating point. Can one obtain another 8-fold improvement using the 8 P cores on an M1 Pro in parallel?

Thanks!

LikeLiked by 1 person
- 15
  
  hoakley on April 25, 2022 at 10:15 pm
  
  Thank you.
  You can do that already: limit the number of threads to no more than 8, and give them any QoS other than the lowest. They’ll then be run on the P cores in parallel.
  Howard.
  
  LikeLike
  - 16
    
    Warren Nagourney on April 25, 2022 at 10:39 pm
    
    Thank you.
    
    LikeLiked by 1 person
17

Michael Tsai - Blog - How macOS Manages M1 CPU Cores on April 25, 2022 at 9:28 pm

[…] Howard Oakley: […]

LikeLike
18

B on April 27, 2022 at 3:49 pm

Slightly off topic, but what software do you use to make your flowcharts? Thanks for another great article.

LikeLiked by 1 person
- 19
  
  hoakley on April 27, 2022 at 3:51 pm
  
  Scapple. It’s cheap, simple to use and quick.
  Howard
  
  LikeLike
  - 20
    
    B on April 27, 2022 at 4:17 pm
    
    Thanks for the quick reply, you’re the best!
    
    LikeLiked by 1 person
    - 21
      
      hoakley on April 27, 2022 at 7:10 pm
      
      Thank you.
      Howard.
      
      LikeLike
22

Dino A. Navarroli on June 4, 2022 at 1:32 am

Looking for fellow tech bloggers with an in-depth writing style and came across this post. Very excellently written up! It’s nice to be able to read further into how the M1 chip is designed. The efficiency and power is incredible, and you explained it in the same manner. I’m happy to follow you! Looking forward to more of these types of articles.

LikeLiked by 1 person
- 23
  
  hoakley on June 4, 2022 at 6:15 am
  
  Thank you.
  Howard.
  
  LikeLike
24

Kilrah on July 27, 2022 at 3:13 pm

A friend has one of the “MacBook Pro 14-inch notebooks have a reduced M1 Pro chip with only 6 P cores instead of 8”, and his CPU history graph doesn’t show the type, suggests they instead nixed the E-cores instead of 2 P ones?

LikeLiked by 1 person
- 25
  
  hoakley on July 27, 2022 at 9:06 pm
  
  The two variants available are 6P+2E, or 8P+2E. They’re essentially the same chip, I believe, but in the cheaper version, two of the eight P cores didn’t pass test, so are disabled. That should give a total of 8 cores. It’s possible that Activity Monitor has become baffled by this, but you can easily check in System Information, or using the powermetrics command tool.
  Howard.
  
  LikeLike