这是用户在 2024-7-18 20:31 为 https://app.immersivetranslate.com/pdf-pro/c3832827-7987-4e2f-a6a8-c3689c822315 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
2024_07_18_c5dfbbfd1528aff4affeg
The user manual 用户手册
Bartosz Taudul wolf@nereid.pl
July 16,2024 2024 年 7 月 16 日

Quick overview 快速概览

Hello and welcome to the Tracy Profiler user manual! Here you will find all the information you need to start using the profiler. This manual has the following layout:
您好,欢迎阅读 Tracy Profiler 用户手册!在这里,您可以找到开始使用剖析仪所需的全部信息。本手册的布局如下
  • Chapter 1, A quick look at Tracy Profiler, gives a short description of what Tracy is and how it works.
    第 1 章 "快速了解 Tracy Profiler "简要介绍了 Tracy 是什么以及如何工作。
  • Chapter 2, First steps, shows how you can integrate the profiler into your application and how to build the graphical user interface (section 2.3). At this point, you will be able to establish a connection from the profiler to your application.
    第 2 章(第一步)将介绍如何将剖析器集成到应用程序中,以及如何构建图形用户界面(第 2.3 节)。至此,您就可以建立剖析器与应用程序的连接了。
  • Chapter 3, Client markup, provides information on how to instrument your application, in order to retrieve useful profiling data. This includes a description of the C API (section 3.13), which enables usage of Tracy in any programming language.
    第 3 章 "客户端标记 "介绍了如何检测应用程序,以获取有用的剖析数据。其中包括对 C API 的描述(第 3.13 节),该 API 可以在任何编程语言中使用 Tracy。
  • Chapter 4, Capturing the data, goes into more detail on how the profiling information can be captured and stored on disk.
    第 4 章 "捕获数据 "将详细介绍如何捕获剖析信息并将其存储到磁盘上。
  • Chapter 5, Analyzing captured data, guides you through the graphical user interface of the profiler.
    第 5 章 "分析捕获的数据 "将指导您了解剖析器的图形用户界面。
  • Chapter 6, Exporting zone statistics to CSV, explains how to export some zone timing statistics into a CSV format.
    第 6 章 "将区段统计信息导出为 CSV "介绍如何将一些区段计时统计信息导出为 CSV 格式。
  • Chapter 7, Importing external profiling data, documents how to import data from other profilers.
    第 7 章 "导入外部剖析数据 "介绍如何从其他剖析器导入数据。
  • Chapter 8, Configuration files, gives information on the profiler settings.
    第 8 章(配置文件)提供了有关剖析器设置的信息。

Quick-start guide 快速入门指南

For Tracy to profile your application, you will need to integrate the profiler into your application and run an independent executable that will act both as a server with which your application will communicate and as a profiling viewer. The most basic integration looks like this:
要让 Tracy 对您的应用程序进行剖析,您需要将剖析器集成到您的应用程序中,并运行一个独立的可执行文件,该可执行文件既可充当与您的应用程序通信的服务器,也可充当剖析查看器。最基本的集成如下所示
  • Add the Tracy repository to your project directory.
    将 Tracy 仓库添加到项目目录中。
  • Tracy source files in the project/tracy/public directory.
    项目/tracy/公共目录中的 Tracy 源文件。
  • Add TracyClient.cpp as a source file.
    添加 TracyClient.cpp 作为源文件。
  • Add tracy/Tracy.hpp as an include file.
    添加 tracy/Tracy.hpp 作为包含文件。
  • Include Tracy.hpp in every file you are interested in profiling.
    将 Tracy.hpp 包含在您有兴趣进行剖析的每个文件中。
  • Define TRACY_ENABLE for the WHOLE project.
    为整个项目定义 TRACY_ENABLE。
  • Add the macro FrameMark at the end of each frame loop.
    在每个帧循环的末尾添加宏 FrameMark。
  • Add the macro ZoneScoped as the first line of your function definitions to include them in the profile.
    将宏 ZoneScoped 添加到函数定义的第一行,以便将其包含在配置文件中。
  • Compile and run both your application and the profiler server.
    编译并运行应用程序和剖析器服务器。
  • Hit Connect on the profiler server.
    在剖析器服务器上点击连接。
  • Tada! You're profiling your program!
    Tada!你正在剖析你的程序!
There's much more Tracy can do, which can be explored by carefully reading this manual. In case any problems should surface, refer to section 2.1 to ensure you've correctly included Tracy in your project. Additionally, you should refer to section 3 to make sure you are using FrameMark, ZoneScoped, and any other Tracy constructs correctly.
通过仔细阅读本手册,你可以了解到 Tracy 的更多功能。如果出现任何问题,请参阅第 2.1 节,以确保您已将 Tracy 正确包含在项目中。此外,您还应参阅第 3 节,确保正确使用 FrameMark、ZoneScoped 和其他任何 Tracy 结构。

Contents 目录

1 A quick look at Tracy Profiler ..... 8
1 快速了解 Tracy Profiler .....8

1.1 Real-time ..... 8
1.1 实时 .....8

1.2 Nanosecond resolution ..... 8
1.2 纳秒分辨率 .....8

1.2.1 Timer accuracy ..... 9
1.2.1 定时器精度 .....9

1.3 Frame profiler ..... 9
1.3 帧剖析器 .....9

1.4 Sampling profiler ..... 10
1.4 采样剖面仪 .....10

1.5 Remote or embedded telemetry ..... 10
1.5 远程或嵌入式遥测 .....10

1.6 Why Tracy? ..... 10
1.6 为什么选择特蕾西?.....10

1.7 Performance impact ..... 11
1.7 性能影响 .....11

1.7.1 Assembly analysis ..... 11
1.7.1 装配分析 .....11

1.8 Examples ..... 12
1.8 示例 .....12

1.9 On the web ..... 12
1.9 网上 .....12

1.9.1 Binary distribution ..... 12
1.9.1 二进制分布 .....12

2 First steps ..... 12
2 第一步 .....12

2.1 Initial client setup ..... 13
2.1 客户端初始设置 .....13

2.1.1 Static library ..... 14
2.1.1 静态图书馆 .....14

2.1.2 CMake integration ..... 14
2.1.2 CMake 集成 .....14

2.1.3 Meson integration ..... 15
2.1.3 介子积分 .....15

2.1.4 Short-lived applications ..... 15
2.1.4 短期应用 .....15

2.1.5 On-demand profiling ..... 15
2.1.5 按需特征分析 .....15

2.1.6 Client discovery ..... 16
2.1.6 客户端发现 .....16

2.1.7 Client network interface ..... 16
2.1.7 客户网络接口 .....16

2.1.8 Setup for multi-DLL projects ..... 16
2.1.8 多动态链接库项目的设置 .....16

2.1.9 Problematic platforms ..... 17
2.1.9 问题平台 .....17

2.1.9.1 Microsoft Visual Studio ..... 17
2.1.9.1 Microsoft Visual Studio .....17

2.1.9.2 Universal Windows Platform ..... 17
2.1.9.2 通用视窗平台 .....17

2.1.9.3 Apple woes ..... 17
2.1.9.3 苹果公司的困境 .....17

2.1.9.4 Android lunacy ..... 18
2.1.9.4 Android 疯子 .....18

2.1.9.5 Virtual machines ..... 18
2.1.9.5 虚拟机 .....18

2.1.9.6 Docker on Linux ..... 18
2.1.9.6 Linux 上的 Docker .....18

2.1.10 Changing network port ..... 19
2.1.10 更改网络端口 .....19

2.1.11 Limitations ..... 19
2.1.11 限制 .....19

2.2 Check your environment ..... 19
2.2 检查您的环境 .....19

2.2.1 Operating system ..... 20
2.2.1 操作系统 .....20

2.2.2 CPU design ..... 20
2.2.2 CPU 设计 .....20

2.2.2.1 Superscalar out-of-order speculative execution ..... 20
2.2.2.1 超标量无序投机执行 .....20

2.2.2.2 Simultaneous multithreading ..... 20
2.2.2.2 同时多线程运行 .....20

2.2.2.3 Turbo mode frequency scaling ..... 21
2.2.2.3 Turbo 模式频率缩放 .....21

2.2.2.4 Power saving ..... 21
2.2.2.4 节电 .....21

2.2.2.5 AVX offset and power licenses ..... 21
2.2.2.5 AVX 偏移和功率许可 .....21

2.2.2.6 Summing it up ..... 22
2.2.2.6 总结 .....22

2.3 Building the server ..... 22
2.3 构建服务器 .....22

2.3.1 Required libraries ..... 23
2.3.1 所需图书馆 .....23

2.3.1.1 Windows ..... 23
2.3.1.1 Windows .....23

2.3.1.2 Unix ..... 23
2.3.1.2 Unix .....23

2.3.1.3 Linux ..... 23
2.3.1.3 Linux .....23

2.3.2 Using an IDE ..... 24
2.3.2 使用集成开发环境 .....24

2.3.3 Embedding the server in profiled application ..... 25
2.3.3 在剖析应用程序中嵌入服务器 .....25

2.3.4 DPI scaling ..... 25
2.3.4 DPI 缩放 .....25

2.4 Naming threads ..... 25
2.4 命名线程 .....25

2.4.1 Source location data customization ..... 25
2.4.1 源位置数据定制 .....25

2.5 Crash handling ..... 26
2.5 碰撞处理 .....26

2.6 Feature support matrix ..... 26
2.6 特征支持矩阵 .....26

3 Client markup ..... 26
3 客户标记 .....26

3.1 Handling text strings ..... 26
3.1 处理文本字符串 .....26

3.1.1 Program data lifetime ..... 27
3.1.1 程序数据寿命 .....27

3.1.2 Unique pointers ..... 28
3.1.2 唯一指针 .....28

3.2 Specifying colors ..... 28
3.2 指定颜色 .....28

3.3 Marking frames ..... 29
3.3 标记框架 .....29

3.3.1 Secondary frame sets ..... 29
3.3.1 二级帧组 .....29

3.3.2 Discontinuous frames ..... 29
3.3.2 不连续帧 .....29

3.3.3 Frame images ..... 29
3.3.3 帧图像 .....29

3.3.3.1 OpenGL screen capture code example ..... 30
3.3.3.1 OpenGL 屏幕捕捉代码示例 .....30

3.4 Marking zones ..... 33
3.4 标识区 .....33

3.4.1 Manual management of zone scope ..... 33
3.4.1 区域范围的人工管理 .....33

3.4.2 Multiple zones in one scope ..... 33
3.4.2 一个范围内的多个区 .....33

3.4.3 Filtering zones ..... 34
3.4.3 筛选区 .....34

3.4.4 Transient zones ..... 35
3.4.4 瞬变区 .....35

3.4.5 Variable shadowing ..... 35
3.4.5 可变阴影 .....35

3.4.6 Exiting program from within a zone ..... 35
3.4.6 从区段内退出程序 .....35

3.5 Marking locks ..... 35
3.5 标记锁 .....35

3.5.1 Custom locks ..... 36
3.5.1 自定义锁 .....36

3.6 Plotting data ..... 36
3.6 绘制数据 .....36

3.7 Message log ..... 37
3.7 消息日志 .....37

3.7.1 Application information ..... 37
3.7.1 应用信息 .....37

3.8 Memory profiling ..... 37
3.8 内存剖析 .....37

3.8.1 Memory pools ..... 38
3.8.1 内存池 .....38

3.9 GPU profiling ..... 39
3.9 GPU剖析 .....39

3.9.1 OpenGL ..... 39
3.9.1 OpenGL .....39

3.9.2 Vulkan ..... 40
3.9.2 Vulkan .....40

3.9.3 Direct3D 11 ..... 41
3.9.3 Direct3D 11 .....41

3.9.4 Direct3D 12 ..... 41
3.9.4 Direct3D 12 .....41

3.9.5 OpenCL ..... 41
3.9.5 OpenCL .....41

3.9.6 Multiple zones in one scope ..... 42
3.9.6 一个范围内的多个区 .....42

3.9.7 Transient GPU zones ..... 42
3.9.7 GPU 瞬态区 .....42

3.10 Fibers ..... 42
3.10 纤维 .....42

3.11 Collecting call stacks ..... 43
3.11 收集调用堆栈 .....43

3.11.1 Debugging symbols ..... 44
3.11.1 调试符号 .....44

3.11.1.1 External libraries ..... 45
3.11.1.1 外部图书馆 .....45

3.11.1.2 Using the dbghelp library on Windows ..... 45
3.11.1.2 在 Windows 上使用 dbghelp 库 .....45

3.11.1.3 Disabling resolution of inline frames ..... 46
3.11.1.3 禁用内联帧的分辨率 .....46

3.11.1.4 Offline symbol resolution ..... 46
3.11.1.4 脱机符号分辨率 .....46

3.12 Lua support ..... 47
3.12 Lua 支持 .....47

3.12.1 Call stacks ..... 47
3.12.1 调用堆栈 .....47

3.12.2 Instrumentation cleanup ..... 47
3.12.2 仪器清理 .....47

3.13 C API ..... 47
3.13 C API .....47

3.13.1 Setting thread names ..... 48
3.13.1 设置线程名称 .....48

3.13.2 Frame markup ..... 48
3.13.2 框架标记 .....48

3.13.3 Zone markup ..... 48
3.13.3 区域标记 .....48

3.13.3.1 Zone context data structure ..... 49
3.13.3.1 区域上下文数据结构 .....49

3.13.3.2 Zone validation ..... 49
3.13.3.2 区验证 .....49

3.13.3.3 Transient zones in C API ..... 50
3.13.3.3 C API 中的瞬变区 .....50

3.13.4 Lock markup ..... 50
3.13.4 锁定标记 .....50

3.13.5 Memory profiling ..... 51
3.13.5 内存剖析 .....51

3.13.6 Plots and messages ..... 51
3.13.6 图形和信息 .....51

3.13.7 GPU zones ..... 51
3.13.7 GPU 区域 .....51

3.13.8 Fibers ..... 52
3.13.8 纤维 .....52

3.13.9 Connection Status ..... 52
3.13.9 连接状态 .....52

3.13.10 Call stacks ..... 52
3.13.10 调用堆栈 .....52

3.13.11 Using the C API to implement bindings ..... 52
3.13.11 使用 C API 实现绑定 .....52

3.14 Python API . ..... 53
3.14 Python API . .....53

3.14.1 Bindings ..... 53
3.14.1 绑定 .....53

3.14.2 Building the Python package ..... 55
3.14.2 构建 Python 软件包 .....55

3.15 Automated data collection ..... 55
3.15 自动数据收集 .....55

3.15.1 Privilege elevation ..... 55
3.15.1 权限提升 .....55

3.15.2 CPU usage ..... 56
3.15.2 CPU 使用率 .....56

3.15.3 Context switches ..... 56
3.15.3 上下文切换 .....56

3.15.4 CPU topology ..... 56
3.15.4 CPU 拓扑 .....56

3.15.5 Call stack sampling ..... 57
3.15.5 调用堆栈采样 .....57

3.15.5.1 Wait stacks ..... 57
3.15.5.1 等待堆栈 .....57

3.15.6 Hardware sampling ..... 58
3.15.6 硬件采样 .....58

3.15.7 Executable code retrieval ..... 59
3.15.7 可执行代码检索 .....59

3.15.8 Vertical synchronization ..... 59
3.15.8 垂直同步 .....59

3.16 Trace parameters ..... 59
3.16 跟踪参数 .....59

3.17 Source contents callback ..... 60
3.17 源内容回调 .....60

3.18 Connection status ..... 60
3.18 连接状态 .....60

4 Capturing the data ..... 60
4 采集数据 .....60

4.1 Command line ..... 60
4.1 命令行 .....60

4.2 Interactive profiling ..... 61
4.2 交互式剖析 .....61

4.2.1 Connection information pop-up ..... 62
4.2.1 弹出连接信息 .....62

4.2.2 Automatic loading or connecting ..... 63
4.2.2 自动装载或连接 .....63

4.3 Connection speed ..... 63
4.3 连接速度 .....63

4.4 Memory usage ..... 63
4.4 内存使用情况 .....63

4.5 Trace versioning ..... 63
4.5 跟踪版本 .....63

4.5.1 Archival mode ..... 63
4.5.1 存档模式 .....63

4.5.2 Compression streams ..... 64
4.5.2 压缩流 .....64

4.5.3 Frame images dictionary ..... 66
4.5.3 帧图像词典 .....66

4.5.4 Data removal ..... 66
4.5.4 数据删除 .....66

4.6 Source file cache scan ..... 67
4.6 源文件缓存扫描 .....67

4.7 Instrumentation failures ..... 67
4.7 仪器故障 .....67

5 Analyzing captured data ..... 67
5 分析捕获的数据 .....67

5.1 Time display ..... 67
5.1 时间显示 .....67

5.2 Main profiler window ..... 68
5.2 剖析器主窗口 .....68

5.2.1 Control menu ..... 68
5.2.1 控制菜单 .....68

5.2.1.1 Notification area ..... 69
5.2.1.1 通知区域 .....69

5.2.2 Frame time graph ..... 70
5.2.2 帧时间图 .....70

5.2.3 Timeline view ..... 70
5.2.3 时间轴视图 .....70

5.2.3.1 Time scale ..... 71
5.2.3.1 时间尺度 .....71

5.2.3.2 Frame sets ..... 71
5.2.3.2 帧组 .....71

5.2.3.3 Zones, locks and plots display ..... 71
5.2.3.3 区域、锁和绘图显示 .....71

5.2.4 Navigating the view ..... 75
5.2.4 浏览视图 .....75

5.3 Time ranges ..... 76
5.3 时间范围 .....76

5.3.1 Annotating the trace ..... 76
5.3.1 注释轨迹 .....76

5.4 Options menu ..... 76
5.4 选项菜单 .....76

5.5 Messages window ..... 78
5.5 信息窗口 .....78

5.6 Statistics window ..... 78
5.6 统计窗口 .....78

5.6.1 Instrumentation mode ..... 79
5.6.1 仪表模式 .....79

5.6.2 Sampling mode ..... 79
5.6.2 采样模式 .....79

5.6.3 GPU zones mode ..... 80
5.6.3 GPU 区域模式 .....80

5.7 Find zone window ..... 80
5.7 查找区域窗口 .....80

5.7.1 Timeline interaction ..... 83
5.7.1 时间轴互动 .....83

5.7.2 Frame time graph interaction ..... 83
5.7.2 帧时间图交互作用 .....83

5.7.3 Limiting zone time range ..... 83
5.7.3 限制区段时间范围 .....83

5.7.4 Zone samples ..... 84
5.7.4 区域样本 .....84

5.8 Compare traces window ..... 84
5.8 比较轨迹窗口 .....84

5.8.1 Source files diff ..... 85
5.8.1 源文件 diff .....85

5.9 Memory window ..... 85
5.9 内存窗口 .....85

5.9.1 Allocations ..... 85
5.9.1 分配 .....85

5.9.2 Active allocations ..... 86
5.9.2 主动分配 .....86

5.9.3 Memory map ..... 86
5.9.3 内存映射 .....86

5.9.4 Bottom-up call stack tree ..... 86
5.9.4 自下而上的调用栈树 .....86

5.9.5 Top-down call stack tree ..... 86
5.9.5 自上而下的调用栈树 .....86

5.9.6 Looking back at the memory history ..... 86
5.9.6 回顾内存历史 .....86

5.10 Allocations list window ..... 87
5.10 分配列表窗口 .....87

5.11 Memory allocation information window ..... 87
5.11 内存分配信息窗口 .....87

5.12 Trace information window ..... 87
5.12 跟踪信息窗口 .....87

5.13 Zone information window ..... 88
5.13 区域信息窗口 .....88

5.14 Call stack window ..... 89
5.14 调用堆栈窗口 .....89

5.14.1 Reading call stacks ..... 89
5.14.1 读取调用栈 .....89

5.15 Sample entry call stacks window ..... 90
5.15 样本入口调用堆栈窗口 .....90

5.16 Source view window ..... 90
5.16 源视图窗口 .....90

5.16.1 Source file view ..... 90
5.16.1 源文件视图 .....90

5.16.2 Symbol view ..... 91
5.16.2 符号视图 .....91

5.16.2.1 Source mode ..... 91
5.16.2.1 源模式 .....91

5.16.2.2 Assembly mode ..... 92
5.16.2.2 装配模式 .....92

5.16.2.3 Combined mode ..... 94
5.16.2.3 综合模式 .....94

5.16.2.4 Instruction pointer cost statistics ..... 94
5.16.2.4 指令指针成本统计 .....94

5.16.2.5 Inspecting hardware samples ..... 95
5.16.2.5 检查硬件样品 .....95

5.17 Wait stacks window ..... 96
5.17 等待堆栈窗口 .....96

5.18 Lock information window ..... 96
5.18 锁定信息窗口 .....96

5.19 Frame image playback window ..... 96
5.19 帧图像回放窗口 .....96

5.20 CPU data window ..... 96
5.20 CPU 数据窗口 .....96

5.21 Annotation settings window ..... 96
5.21 注释设置窗口 .....96

5.22 Annotation list window ..... 97
5.22 注释列表窗口 .....97

5.23 Time range limits ..... 97
5.23 时间范围限制 .....97

6 Exporting zone statistics to CSV ..... 97
6 将区统计数据导出为 CSV .....97

7 Importing external profiling data ..... 98
7 导入外部剖析数据 .....98

8 Configuration files ..... 99
8 配置文件 .....99

8.1 Root directory ..... 99
8.1 根目录 .....99

8.2 Trace specific settings ..... 99
8.2 特定跟踪设置 .....99

A License ..... 100
许可证 .....100

B Inventory of external libraries ..... 100
B 外部图书馆清单 .....100

1 A quick look at Tracy Profiler
1 快速了解 Tracy Profiler

Tracy is a real-time, nanosecond resolution hybrid frame and sampling profiler that you can use for remote or embedded telemetry of games and other applications. It can profile , memory allocations, locks, context switches, automatically attribute screenshots to captured frames, and much more.
Tracy 是一款实时、纳秒级分辨率的混合帧和采样剖析器,可用于对游戏和其他应用程序进行远程或嵌入式遥测。它可以剖析 、内存分配、锁、上下文切换、自动将截图归属于捕获的帧等。
While Tracy can perform statistical analysis of sampled call stack data, just like other statistical profilers (such as VTune, perf, or Very Sleepy), it mainly focuses on manual markup of the source code. Such markup allows frame-by-frame inspection of the program execution. For example, you will be able to see exactly which functions are called, how much time they require, and how they interact with each other in a multi-threaded environment. In contrast, the statistical analysis may show you the hot spots in your code, but it cannot accurately pinpoint the underlying cause for semi-random frame stutter that may occur every couple of seconds.
虽然 Tracy 可以像其他统计分析工具(如 VTune、perf 或 Very Sleepy)一样,对取样的调用堆栈数据进行统计分析,但它主要侧重于对源代码进行手动标记。通过这种标记,可以逐帧检查程序的执行情况。例如,在多线程环境中,你可以清楚地看到哪些函数被调用,它们需要多少时间,以及它们之间是如何交互的。相比之下,统计分析可能会让你看到代码中的热点,但却无法准确找出每隔几秒就会出现的半随机帧卡顿的根本原因。
Even though Tracy targets frame profiling, with the emphasis on analysis of frame time in real-time applications (i.e. games), it does work with utilities that do not employ the concept of a frame. There's nothing that would prohibit the profiling of, for example, a compression tool or an event-driven UI application.
尽管 Tracy 以帧剖析为目标,重点分析实时应用程序(如游戏)中的帧时间,但它也适用于不使用帧概念的实用程序。例如,压缩工具或事件驱动 UI 应用程序的剖析就不会受到任何禁止。
You may think of Tracy as the RAD Telemetry plus Intel VTune, on overdrive.
您可能会认为 Tracy 是 RAD 遥测技术和英特尔 VTune 的完美结合。

1.1 Real-time 1.1 实时

The concept of Tracy being a real-time profiler may be explained in a couple of different ways:
Tracy 是一个实时剖析器,这个概念可以用几种不同的方式来解释:
  1. The profiled application is not slowed down by profiling . The act of recording a profiling event has virtually zero cost - it only takes a few nanoseconds. Even on low-power mobile devices, execution speed has no noticeable impact.
    剖析不会降低剖析应用程序的运行速度。记录剖析事件的成本几乎为零,只需几纳秒。即使在低功耗移动设备上,执行速度也不会受到明显影响。
  2. The profiler itself works in real-time, without the need to process collected data in a complex way. Actually, it is pretty inefficient in how it works because it recalculates the data it presents each frame anew. And yet, it can run at 60 frames per second.
    剖析器本身是实时运行的,无需对收集到的数据进行复杂的处理。实际上,它的工作效率非常低,因为它每帧都要重新计算所显示的数据。然而,它却能以每秒 60 帧的速度运行。
  3. The profiler has full functionality when the profiled application runs and the data is still collected. You may interact with your application and immediately switch to the profiler when a performance drop occurs.
    当剖析程序运行且数据仍在收集时,剖析器具有完整的功能。您可以与应用程序交互,并在出现性能下降时立即切换到剖析器。

1.2 Nanosecond resolution
1.2 纳秒分辨率

It is hard to imagine how long a nanosecond is. One good analogy is to compare it with a measure of length. Let's say that one second is one meter (the average doorknob is at the height of one meter).
很难想象纳秒有多长。一个很好的比喻是将它与长度的度量进行比较。假设一秒是一米(普通门把手的高度是一米)。
One millisecond ( of a second) would be then the length of a millimeter. The average size of a red ant or the width of a pencil is 5 or 6 mm . A modern game running at 60 frames per second has only 16 ms to update the game world and render the entire scene.
一毫秒( 秒)相当于一毫米的长度。一只红蚂蚁的平均大小或一支铅笔的宽度为 5 或 6 毫米。以每秒 60 帧的速度运行的现代游戏只有 16 毫秒的时间来更新游戏世界和渲染整个场景。
One microsecond ( of a millisecond) in our comparison equals one micron. The diameter of a typical bacterium ranges from 1 to 10 microns. The diameter of a red blood cell or width of a strand of spider web silk is about .
在我们的比较中,一微秒( 毫秒)等于一微米。典型细菌的直径在 1 到 10 微米之间。一个红血球的直径或一缕蜘蛛网丝的宽度约为
And finally, one nanosecond ( of a microsecond) would be one nanometer. The modern microprocessor transistor gate, the width of the DNA helix, or the thickness of a cell membrane are in the range of 5 nm . In one ns the light can travel only 30 cm .
最后,一纳秒( 一微秒)就是一纳米。现代微处理器的晶体管栅极、DNA 螺旋的宽度或细胞膜的厚度都在 5 纳米的范围内。在 1 毫微秒内,光只能传播 30 厘米。
Tracy can achieve single-digit nanosecond measurement resolution due to usage of hardware timing mechanisms on the x86 and ARM architectures . Other profilers may rely on the timers provided by the operating system, which do have significantly reduced resolution (about ). This is enough to hide the subtle impact of cache access optimization, etc.
由于使用了 x86 和 ARM 体系结构上的硬件定时机制 ,Tracy 可以实现个位纳秒级的测量分辨率。其他剖析器可能依赖于操作系统提供的定时器,而操作系统提供的定时器确实大大降低了分辨率(约 )。这足以掩盖缓存访问优化等的微妙影响。

1.2.1 Timer accuracy 1.2.1 定时器精度

You may wonder why it is vital to have a genuinely high resolution timer . After all, you only want to profile functions with long execution times and not some short-lived procedures that have no impact on the application's run time.
您可能会问,为什么需要一个真正高分辨率的定时器 。毕竟,您只想对执行时间较长的函数进行剖析,而不想剖析一些对应用程序运行时间没有影响的短时程序。
It is wrong to think so. Optimizing a function to execute in 430 ns, instead of 535 ns (note that there is only a 100 ns difference) results in 14 ms savings if the function is executed 18000 times . It may not seem like a big number, but this is how much time there is to render a complete frame in a 60 FPS game. Imagine that this is your particle processing loop.
这种想法是错误的。如果将一个函数的执行时间从 535 ns 优化为 430 ns(请注意,两者之间只有 100 ns 的差距),那么该函数在执行 18000 次 时将节省 14 毫秒。这个数字看似不大,但在 60 FPS 的游戏中,渲染一帧完整的画面就需要这么多时间。想象一下,这就是你的粒子处理循环。
You also need to understand how timer precision is reflected in measurement errors. Take a look at figure 1. There you can see three discrete timer tick events, which increase the value reported by the timer by 300 ns . You can also see four readings of time ranges, marked and .
您还需要了解计时器的精度如何反映在测量误差上。请看图 1。您可以看到三个离散的定时器滴答事件,它们将定时器报告的值增加了 300 ns。您还可以看到标有 的四个时间范围读数。
Figure 1: Low precision (300 ns) timer. Discrete timer ticks are indicated by the (1) icon.
图 1:低精度(300 毫微秒)定时器。图标 (1) 表示离散定时器刻度。
Now let's take a look at the timer readings.
现在让我们来看看计时器的读数。
  • The and ranges both take a very short amount of time (10 ns), but the range is reported as 300 ns , and the range is reported as 0 ns.
    范围都需要很短的时间(10 毫微秒),但 范围报告为 300 毫微秒,而 范围报告为 0 毫微秒。
  • The range takes a considerable amount of time ( 590 ns ), but according to the timer readings, it took the same time ( 300 ns ) as the short lived range.
    量程需要相当长的时间(590 ns),但根据计时器的读数,它与短时 量程所用的时间(300 ns)相同。
  • The range ( 610 ns ) is only 20 ns longer than the range, but it is reported as 900 ns , a 600 ns difference!
    范围(610 毫微秒)只比 范围长 20 毫微秒,但却被报告为 900 毫微秒,两者相差 600 毫微秒!
Here, you can see why using a high-precision timer is essential. While there is no escape from the measurement errors, a profiler can reduce their impact by increasing the timer accuracy.
由此可见,使用高精度计时器的重要性。虽然测量误差是无法避免的,但剖析器可以通过提高定时器精度来减少误差的影响。

1.3 Frame profiler 1.3 框架剖析器

Tracy aims to give you an understanding of the inner workings of a tight loop of a game (or any other kind of interactive application). That's why it slices the execution time of a program using the frame as a basic work-unit . The most interesting frames are the ones that took longer than the allocated time, producing visible hitches in the on-screen animation. Tracy allows inspection of such misbehavior.
Tracy 的目的是让你了解游戏(或任何其他类型的交互式应用程序)紧密循环的内部运作。因此,它使用帧 作为基本工作单元 来切分程序的执行时间。最有趣的帧是那些耗时超过分配时间的帧,它们会在屏幕动画中产生明显的停顿。Tracy 可以检查这些错误行为。

1.4 Sampling profiler 1.4 采样剖面仪

Tracy can periodically sample what the profiled application is doing, which provides detailed performance information at the source line/assembly instruction level. This can give you a deep understanding of how the processor executes the program. Using this information, you can get a coarse view at the call stacks, fine-tune your algorithms, or even 'steal' an optimization performed by one compiler and make it available for the others.
Tracy 可以定期采样剖析应用程序的运行情况,从而提供源代码行/汇编指令级的详细性能信息。这能让你深入了解处理器是如何执行程序的。利用这些信息,你可以粗略地查看调用堆栈,微调你的算法,甚至 "窃取 "一个编译器执行的优化,并将其提供给其他编译器使用。
On some platforms, it is possible to sample the hardware performance counters, which will give you information not only where your program is running slowly, but also why.
在某些平台上,可以对硬件性能计数器进行采样,这不仅可以提供程序运行缓慢的位置信息,还可以提供运行缓慢的原因。

1.5 Remote or embedded telemetry
1.5 远程或嵌入式遥测技术

Tracy uses the client-server model to enable a wide range of use-cases (see figure 2). For example, you may profile a game on a mobile phone over the wireless connection, with the profiler running on a desktop computer. Or you can run the client and server on the same machine, using a localhost connection. It is also possible to embed the visualization front-end in the profiled application, making the profiling self-contained .
Tracy 采用客户端-服务器模式,可支持多种用例(见图 2)。例如,您可以通过无线连接在手机上对游戏进行剖析,而剖析器则在台式电脑上运行。您也可以使用本地主机连接,在同一台机器上运行客户端和服务器。您还可以将可视化前端嵌入到剖析应用程序中,从而使剖析过程自成一体
Figure 2: Client-server model.
图 2:客户端-服务器模式。
In Tracy terminology, the profiled application is a client, and the profiler itself is a server. It was named this way because the client is a thin layer that just collects events and sends them for processing and long-term storage on the server. The fact that the server needs to connect to the client to begin the profiling session may be a bit confusing at first.
用 Tracy 术语来说,被剖析的应用程序是客户端,而剖析器本身是服务器。之所以这样命名,是因为客户端是一个薄层,它只是收集事件并将其发送到服务器进行处理和长期存储。服务器需要连接到客户端才能开始剖析会话,这一点一开始可能有点令人困惑。

1.6 Why Tracy? 1.6 为什么是特蕾西?

You may wonder why you should use Tracy when so many other profilers are available. Here are some arguments:
您可能会问,既然有这么多其他的剖析器,为什么还要使用 Tracy 呢?以下是一些论据:

- Tracy is free and open-source (BSD license), while RAD Telemetry costs about per year.
- Tracy 是免费的开源软件(BSD 许可),而 RAD Telemetry 每年的费用约为

- Tracy provides out-of-the-box Lua bindings. It has been successfully integrated with other native and interpreted languages (Rust, Arma scripting language) using the C API (see chapter 3.13 for reference).
- Tracy 提供开箱即用的 Lua 绑定。它已使用 C API 成功地与其他本地语言和解释型语言(Rust、Arma 脚本语言)集成(参考资料见第 3.13 章)。

- Tracy has a wide variety of profiling options. For example, you can profile CPU, GPU, locks, memory allocations, context switches, and more.
- Tracy 有多种剖析选项。例如,你可以对 CPU、GPU、锁、内存分配、上下文切换等进行剖析。

- Tracy is feature-rich. For example, statistical information about zones, trace comparisons, or inclusion of inline function frames in call stacks (even in statistics of sampled stacks) are features unique to Tracy.
- Tracy 功能丰富。例如,区域统计信息、跟踪比较或在调用堆栈中包含内联函数框架(甚至在取样堆栈的统计中)都是 Tracy 独有的功能。

- Tracy focuses on performance. It uses many tricks to reduce memory requirements and network bandwidth. As a result, the impact on the client execution speed is minimal, while other profilers perform heavy data processing within the profiled application (and then claim to be lightweight). - Tracy uses low-level kernel APIs, or even raw assembly, where other profilers rely on layers of abstraction.
- Tracy 注重性能。它使用许多技巧来降低内存需求和网络带宽。因此,它对客户端执行速度的影响微乎其微,而其他剖析器则在被剖析应用程序中执行繁重的数据处理(然后声称自己是轻量级的)。- Tracy 使用低级内核 API,甚至是原始汇编,而其他剖析器则依赖于抽象层。

- Tracy is multi-platform right from the very beginning. Both on the client and server-side. Other profilers tend to have Windows-specific graphical interfaces.
- Tracy 从一开始就是多平台的。在客户端和服务器端都是如此。而其他剖析器往往只有 Windows 专用的图形界面。

- Tracy can handle millions of frames, zones, memory events, and so on, while other profilers tend to target very short captures.
- Tracy 可以处理数百万个帧、区域、内存事件等,而其他剖析器往往只能捕捉非常短的数据。

- Tracy doesn't require manual markup of interesting areas in your code to start profiling. Instead, you may rely on automated call stack sampling and add instrumentation later when you know where it's needed.
- Tracy 不需要手动标记代码中有趣的区域就能开始剖析。相反,你可以依赖自动调用堆栈采样,然后在知道哪里需要时再添加仪器。

- Tracy provides a mapping of source code to the assembly, with detailed information about the cost of executing each instruction on the CPU.
- Tracy 提供源代码到汇编的映射,并提供在 CPU 上执行每条指令的成本的详细信息。

1.7 Performance impact 1.7 性能影响

Let's profile an example application to check how much slowdown is introduced by using Tracy. For this purpose we have used etcpak . The input data was a pixels test image, and the pixel block compression function was selected to be instrumented. The image was compressed on 12 parallel threads, and the timing data represents a mean compression time of a single image.
让我们对一个示例应用程序进行剖析,看看使用 Tracy 会带来多大的速度减慢。为此,我们使用了 etcpak 。输入数据是一张 像素测试图像,并选择 像素块压缩功能进行检测。图像是在 12 个并行线程上压缩的,时序数据代表单幅图像的平均压缩时间。
The results are presented in table 1. Dividing the average of run time differences ( 37.7 ms ) by the count of captured zones per single image shows us that the impact of profiling is only 2.25 ns per zone (this includes two events: start and end of a zone).
结果见表 1。将运行时间差的平均值(37.7 毫秒)除以单幅图像 中捕获的区域数,我们可以看出,剖析对每个区域的影响仅为 2.25 毫秒(包括两个事件:区域的开始和结束)。
Mode 模式 Zones (total) 区(总数) Zones (single image) 区域(单个图像) Clean run 清洁运行 Profiling run 剖析运行 Difference 差异
ETC1 110.9 ms 110.9 毫秒 148.2 ms 148.2 毫秒 +37.3 ms +37.3 毫秒
ETC2 212.4 ms 212.4 毫秒 250.5 ms 250.5 毫秒 +38.1 ms +38.1 毫秒
Table 1: Zone capture time cost.
表 1:区域捕获时间成本。

1.7.1 Assembly analysis 1.7.1 装配分析

To see how Tracy achieves such small overhead (only 2.25 ns ), let's take a look at the assembly. The following x64 code is responsible for logging the start of a zone. Do note that it is generated by compiling fully portable C++.
要了解 Tracy 如何实现如此小的开销(仅 2.25 ns),让我们来看看程序集。下面的 x64 代码负责记录区域的开始。请注意,它是通过编译完全可移植的 C++ 生成的。
store zone activity information
TLS
queue address
data address
buffer counter
128 item buffer
check if current buffer is usable
reclaim/alloc next buffer
buffer items are 32 bytes
calculate queue item address
queue item type
retrieve time
or rax,rdx ; construct 64 bit timestamp
mov qword ptr [rbx+1],rax ; write timestamp
lea rax,[__tracy_source_location] ; static struct address
mov qword ptr [rbx+9],rax ; write source location data
lea rax,[rbp+1] ; increment buffer counter
mov qword ptr [rdi+28h],rax ; write buffer counter
The second code block, responsible for ending a zone, is similar but smaller, as it can reuse some variables retrieved in the above code.
第二个代码块负责结束区域,与上述代码类似,但规模较小,因为它可以重复使用在上述代码中检索到的一些变量。

1.8 Examples 1.8 示例

To see how to integrate Tracy into your application, you may look at example programs in the examples directory. Looking at the commit history might be the best way to do that.
要了解如何将 Tracy 集成到您的应用程序中,您可以查看示例目录中的示例程序。查看提交历史可能是最好的方法。

1.9 On the web
1.9 网络

Tracy can be found at the following web addresses:
您可以通过以下网址找到 Tracy:

1.9.1 Binary distribution
1.9.1 二进制分布

The version releases of the profiler are provided as precompiled Windows binaries for download at https://github.com/wolfpld/tracy/releases, along with the user manual. You will need to install the latest Visual C++ redistributable package to use them.
剖析器的各版本均以预编译 Windows 二进制文件的形式提供,可从 https://github.com/wolfpld/tracy/releases 下载,同时还提供用户手册。您需要安装最新的 Visual C++ 可再分发软件包才能使用它们。
Development builds of Windows binaries, and the user manual are available as artifacts created by the automated Continuous Integration system on GitHub.
Windows 二进制文件的开发构建和用户手册可作为自动持续集成系统在 GitHub 上创建的工件提供。
Note that these binary releases require AVX2 instruction set support on the processor. If you have an older CPU, you will need to set a proper instruction set architecture in the project properties and build the executables yourself.
请注意,这些二进制版本要求处理器支持 AVX2 指令集。如果使用的是较旧的 CPU,则需要在项目属性中设置适当的指令集架构,并自行构建可执行文件。

2 First steps 2 第一步

Tracy Profiler supports MSVC, GCC, and clang. You will need to use a reasonably recent version of the compiler due to the requirement. The following platforms are confirmed to be working (this is not a complete list):
Tracy Profiler 支持 MSVC、GCC 和 clang。由于 的要求,您需要使用较新版本的编译器。以下平台已确认可以使用(这并非完整列表):
  • Windows (x86, x64) 视窗(x86、x64)
  • Linux (x86, x64, ARM, ARM64)
    Linux(x86、x64、ARM、ARM64)
  • Android (ARM, ARM64, x86)
    安卓(ARM、ARM64、x86)
  • FreeBSD (x64)
  • WSL (x64)
  • iOS (ARM, ARM64) iOS(ARM、ARM64)
  • QNX (x64)
Moreover, the following platforms are not supported due to how secretive their owners are but were reported to be working after extending the system integration layer:
此外,以下平台由于其所有者的神秘性而不被支持,但据报道,在扩展系统集成层后,这些平台可以正常工作:
  • PlayStation 4
  • Xbox One
  • Nintendo Switch 任天堂交换机
  • Google Stadia 谷歌体育场
You may also try your luck with Mingw, but don't get your hopes too high. This platform was usable some time ago, but nobody is actively working on resolving any issues you might encounter with it.
您也可以试试 Mingw,但不要抱太大希望。这个平台前段时间还可以使用,但目前还没有人在积极解决您可能遇到的任何问题。

2.1 Initial client setup
2.1 客户端初始设置

The recommended way to integrate Tracy into an application is to create a git submodule in the repository (assuming that you use git for version control). This way, it is straightforward to update Tracy to newly released versions. If that's not an option, all the files required to integrate your application with Tracy are contained in the public directory.
将 Tracy 集成到应用程序中的推荐方法是在版本库中创建一个 git 子模块(假设使用 git 进行版本控制)。这样,就可以直接将 Tracy 更新到新发布的版本。如果不能这样做,将应用程序与 Tracy 集成所需的所有文件都包含在公共目录中。

What revision should I use?
我应该使用什么修订版?

You have two options when deciding on the Tracy Profiler version you want to use. Take into consideration the following pros and cons:
在决定使用 Tracy Profiler 版本时,您有两种选择。请考虑以下利弊:
  • Using the last-version-tagged revision will give you a stable platform to work with. You won't experience any breakages, major UI overhauls, or network protocol changes. Unfortunately, you also won't be getting any bug fixes.
    使用最后版本标记的修订版将为您提供一个稳定的工作平台。您不会遇到任何故障、用户界面大修或网络协议更改。不幸的是,你也不会得到任何错误修复。
  • Working with the bleeding edge master development branch will give you access to all the new improvements and features added to the profiler. While it is generally expected that master should always be usable, there are no guarantees that it will be so.
    使用最前沿的主开发分支可以让您获得剖析器的所有新改进和新功能。虽然我们通常希望主开发分支始终可用,但并不能保证它一定可用。
Do note that all bug fixes and pull requests are made against the master branch.
请注意,所有的错误修复和拉取请求都是针对主分支进行的。
With the source code included in your project, add the public/TracyClient.cpp source file to the IDE project or makefile. You're done. Tracy is now integrated into the application.
在项目中包含源代码后,将 public/TracyClient.cpp 源文件添加到集成开发环境项目或 makefile 中。大功告成。现在,Tracy 已集成到应用程序中。
In the default configuration, Tracy is disabled. This way, you don't have to worry that the production builds will collect profiling data. To enable profiling, you will probably want to create a separate build configuration, with the TRACY_ENABLE define.
在默认配置中,Tracy 是禁用的。这样,你就不必担心生产构建会收集剖析数据。要启用剖析功能,您可能需要创建一个单独的构建配置,并使用 TRACY_ENABLE 定义。
Important 重要
  • Double-check that the define name is entered correctly (as TRACY_ENABLE), don't make a mistake of adding an additional at the end. Make sure that this macro is defined for all files across your project (e.g. it should be specified in the CFLAGS variable, which is always passed to the compiler or in an equivalent way), and not as a #define in just some of the source files.
    仔细检查输入的定义名称是否正确(如 TRACY_ENABLE),不要在末尾添加额外的 。确保在整个项目的所有文件中都定义了该宏(例如,应在 CFLAGS 变量中指定该宏,并始终将其传递给编译器或以类似方式指定),而不是仅在某些源文件中作为 #define。
  • Tracy does not consider the value of the definition, only the fact if the macro is defined or not (unless specified otherwise). Be careful not to make the mistake of assigning numeric values to Tracy defines, which could lead you to be puzzled why constructs such as TRACY_ENABLE don't
    Tracy 不考虑定义的值,只考虑是否定义了宏(除非另有规定)。请注意,不要错误地为 Tracy 定义赋予数值,这可能会导致您不明白为什么诸如 TRACY_ENABLE 这样的结构体没有被定义。
!
work as you expect them to do.
按照你的期望工作。
You should compile the application you want to profile with all the usual optimization options enabled (i.e. make a release build). Profiling debugging builds makes little sense, as the unoptimized code and additional checks (asserts, etc.) completely change how the program behaves. In addition, you should enable usage of the native architecture of your CPU (e.g. -march=native) to leverage the expanded instruction sets, which may not be available in the default baseline target configuration.
您应该在启用所有常用优化选项的情况下编译要进行剖析的应用程序(即进行发布版编译)。对调试构建版进行剖析意义不大,因为未经优化的代码和附加检查(断言等)完全改变了程序的运行方式。此外,应启用 CPU 的本地架构(如 -march=native),以充分利用扩展指令集,默认基线目标配置中可能没有这些指令集。
Finally, on Unix, make sure that the application is linked with libraries libpthread and libdl. BSD systems will also need to be linked with libexecinfo.
最后,在 Unix 系统上,确保应用程序与 libpthread 和 libdl 库链接。BSD 系统也需要链接 libexecinfo。

2.1.1 Static library 2.1.1 静态图书馆

If you are compiling Tracy as a static library to link with your application, you may encounter some unexpected problems.
如果将 Tracy 编译成静态库与应用程序链接,可能会遇到一些意想不到的问题。
When you link a library into your executable, the linker checks if the library provides symbols needed by the program. The library is only used if this is the case. This can be an issue because one of the use cases of Tracy is to simply add it to the application, without any manual instrumentation, and let it profile the execution by sampling. If you use any kind of Tracy macros in your program, this won't be a problem.
将库链接到可执行文件时,链接器会检查该库是否提供了程序所需的符号。只有在这种情况下,才会使用该库。这可能是一个问题,因为 Tracy 的一个用例是,只需将其添加到应用程序中,而无需任何手动检测,并让它通过采样对执行情况进行剖析。如果你在程序中使用任何 Tracy 宏,这都不是问题。
However, if you find yourself in a situation where this is a consideration, you can simply add the TracyNoop macro somewhere in your code, for example in the main function. The macro doesn't do anything useful, but it inserts a reference that is satisfied by the static library, which results in the Tracy code being linked in and the profiler being able to work as intended.
不过,如果您发现自己处于需要考虑这个问题的情况下,您只需在代码中的某个地方添加 TracyNoop 宏,例如在主函数中。该宏并不做任何有用的事情,但它会插入一个由静态库满足的引用,从而导致 Tracy 代码被链接进来,剖析器也能按预期工作。

2.1.2 CMake integration 2.1.2 CMake 集成

You can integrate Tracy with CMake by adding the git submodule folder as a subdirectory.
将 git 子模块文件夹添加为子目录,就能将 Tracy 与 CMake 集成。
# set options before add_subdirectory
# available options: TRACY_ENABLE, TRACY_ON_DEMAND, TRACY_NO_BROADCAST,
    TRACY_NO_CODE_TRANSFER, . .
option(TRACY_ENABLE "" ON)
option(TRACY_ON_DEMAND "" ON)
add_subdirectory(3rdparty/tracy) # target: TracyClient or alias Tracy::TracyClient
Link Tracy : :TracyClient to any target where you use Tracy for profiling:
将 Tracy : :TracyClient 链接到任何使用 Tracy 进行剖析的目标:
target_link_libraries( PUBLIC Tracy::TracyClient)

CMake FetchContent

When using CMake 3.11 or newer, you can use Tracy via CMake FetchContent. In this case, you do not need to add a git submodule for Tracy manually. Add this to your CMakeLists.txt:
使用 CMake 3.11 或更新版本时,可通过 CMake FetchContent 使用 Tracy。在这种情况下,无需手动为 Tracy 添加 git 子模块。将其添加到 CMakeLists.txt 中即可:
FetchContent_Declare(
    tracy
    GIT_REPOSITORY https://github.com/wolfpld/tracy.git
    GIT_TAG master
    GIT_SHALLOW TRUE
    GIT_PROGRESS TRUE
)
FetchContent_MakeAvailable(tracy)
Then add this to any target where you use tracy for profiling:
然后将其添加到任何使用 tracy 进行剖析的目标中:

2.1.3 Meson integration 2.1.3 介子积分

If you are using the Meson build system, you can add Tracy using the Wrap dependency system. To do this, place the tracy. wrap file in the subprojects directory of your project, with the following content. The head revision field tracks Tracy's master branch. If you want to lock to a specific version of Tracy instead, you can just set the revision field to an appropriate git tag.
如果使用的是 Meson 编译系统,则可以使用 Wrap 依赖关系系统添加 Tracy。为此,请在项目的子项目目录下放置 tracy.wrap 文件,内容如下。头部修订字段跟踪 Tracy 的主分支。如果你想锁定 Tracy 的某个特定版本,只需将版本字段设置为相应的 git 标签即可。
[wrap-git]
url = https://github.com/wolfpld/tracy.git
revision = head
depth = 1
Then, add the following option entry to the meson.options file. Use the name tracy_enable as shown, because the Tracy subproject options inherit it.
然后,在 meson.options 文件中添加以下选项条目。如图所示,使用 tracy_enable 名称,因为 Tracy 子项目选项继承了该名称。
option('tracy_enable', type: 'boolean', value: false, description: 'Enable profiling')
Next, add the Tracy dependency to the meson.build project definition file. Don't forget to include this dependency in the appropriate executable or library definitions. This dependency will set all the appropriate definitions (such as TRACY_ENABLE) in your program, so you don't have to do it manually.
接下来,在 meson.build 项目定义文件中添加 Tracy 依赖关系。别忘了在相应的可执行文件或库定义中加入该依赖关系。该依赖关系将在程序中设置所有适当的定义(如 TRACY_ENABLE),因此无需手动操作。
tracy = dependency('tracy', static: true)
Finally, let's check if the debugoptimized build type is enabled, and print a little reminder message if it is not. For profiling we want the debug annotations to be present, but we also want to have the code to be optimized.
最后,检查是否启用了调试优化构建类型,如果未启用,则打印一条提示信息。为了进行剖析,我们希望调试注释存在,但同时也希望代码经过优化。
if get_option('tracy_enable') and get_option('buildtype') != 'debugoptimized'
    warning('Profiling builds should set --buildtype=debugoptimized')
endif
Here's a sample command to set up a build directory with profiling enabled. The last option, tracy:on_demand, is used to demonstrate how to set options in the Tracy subproject.
下面是一个示例命令,用于设置启用剖析功能的联编目录。最后一个选项 tracy:on_demand 用于演示如何在 Tracy 子项目中设置选项。
meson setup build --buildtype=debugoptimized -Dtracy_enable=true -Dtracy:on_demand=true

2.1.4 Short-lived applications
2.1.4 短期应用

In case you want to profile a short-lived program (for example, a compression utility that finishes its work in one second), set the TRACY_NO_EXIT environment variable to 1 . With this option enabled, Tracy will not exit until an incoming connection is made, even if the application has already finished executing. If your platform doesn't support an easy setup of environment variables, you may also add the TRACY_NO_EXIT define to your build configuration, which has the same effect.
如果你想配置一个短时程序(例如,在一秒钟内完成工作的压缩工具),可将 TRACY_NO_EXIT 环境变量设为 1。启用该选项后,即使程序已经执行完毕,Tracy 也不会在有输入连接时退出。如果你的平台不支持轻松设置环境变量,也可以在构建配置中添加 TRACY_NO_EXIT 定义,效果相同。

2.1.5 On-demand profiling
2.1.5 按需特征分析

By default, Tracy will begin profiling even before the program enters the main function. However, suppose you don't want to perform a full capture of the application lifetime. In that case, you may define the TRACY_ON_DEMAND macro, which will enable profiling only when there's an established connection with the server.
默认情况下,Tracy 会在程序进入主函数之前就开始剖析。不过,如果您不想对应用程序的整个生命周期进行全面捕获,可以定义 TRACY_ON_DEMAND 宏。在这种情况下,您可以定义 TRACY_ON_DEMAND 宏,这样只有在与服务器建立连接时才会启用剖析。
You should note that if on-demand profiling is disabled (which is the default), then the recorded events will be stored in the system memory until a server connection is made and the data can be uploaded . Depending on the amount of the things profiled, the requirements for event storage can quickly grow up to a couple of gigabytes. Furthermore, since this data is no longer available after the initial connection, you won't be able to perform a second connection to a client unless the on-demand mode is used.
需要注意的是,如果禁用了按需剖析(默认设置),那么记录的事件将存储在系统内存中,直到连接到服务器并上传数据 。根据剖析事件的数量,对事件存储的需求可能会迅速增长到几千兆字节。此外,由于初始连接后这些数据将不再可用,因此除非使用按需模式,否则将无法执行与客户端的第二次连接。

Caveats 注意事项

The client with on-demand profiling enabled needs to perform additional bookkeeping to present a coherent application state to the profiler. This incurs additional time costs for each profiling event.
启用按需剖析的客户端需要执行额外的簿记工作,以便向剖析器呈现一致的应用程序状态。每次剖析都会产生额外的时间成本。

2.1.6 Client discovery 2.1.6 客户发现

By default, the Tracy client will announce its presence to the local network . If you want to disable this feature, define the TRACY_NO_BROADCAST macro.
默认情况下,Tracy 客户端会向本地网络 宣布自己的存在。如果要禁用此功能,请定义 TRACY_NO_BROADCAST 宏。
The program name that is sent out in the broadcast messages can be customized by using the TracySetProgramName (name) macro.
可以使用 TracySetProgramName (name) 宏自定义广播信息中发送的程序名称。

2.1.7 Client network interface
2.1.7 客户网络接口

By default, the Tracy client will listen on all network interfaces. If you want to restrict it to only listening on the localhost interface, define the TRACY_ONLY_LOCALHOST macro at compile-time, or set the TRACY_ONLY_LOCALHOST environment variable to 1 at runtime.
默认情况下,Tracy 客户端将监听所有网络接口。如果要限制它只监听本地主机接口,可在编译时定义 TRACY_ONLY_LOCALHOST 宏,或在运行时将 TRACY_ONLY_LOCALHOST 环境变量设为 1。
If you need to use a specific Tracy client address, such as QNX requires, define the TRACY_CLIENT_ADDRESS macro at compile-time as the desired string address.
如果需要使用特定的 Tracy 客户地址(如 QNX 所要求的),请在编译时将 TRACY_CLIENT_ADDRESS 宏定义为所需的字符串地址。
By default, the Tracy client will listen on IPv6 interfaces, falling back to IPv4 only if IPv6 is unavailable. If you want to restrict it to only listening on IPv4 interfaces, define the TRACY_ONLY_IPV4 macro at compile-time, or set the TRACY_ONLY_IPV4 environment variable to 1 at runtime.
默认情况下,Tracy 客户端将监听 IPv6 接口,只有在 IPv6 不可用时才返回 IPv4。如果要限制它只监听 IPv4 接口,可在编译时定义 TRACY_ONLY_IPV4 宏,或在运行时将 TRACY_ONLY_IPV4 环境变量设为 1。

2.1.8 Setup for multi-DLL projects
2.1.8 多动态链接库项目的设置

Things are a bit different in projects that consist of multiple DLLs/shared objects. Compiling TracyClient.cpp into every DLL is not an option because this would result in several instances of Tracy objects lying around in the process. We instead need to pass their instances to the different DLLs to be reused there.
在包含多个 DLL/共享对象的项目中,情况就有些不同了。我们无法将 TracyClient.cpp 编译到每个 DLL 中,因为这将导致多个 Tracy 对象实例在运行过程中到处乱放。相反,我们需要将它们的实例传递给不同的 DLL,以便在其中重复使用。
For that, you need a profiler DLL to which your executable and the other DLLs link. If that doesn't exist, you have to create one explicitly for Tracy . This library should contain the public/TracyClient.cpp source file. Link the executable and all DLLs you want to profile to this DLL.
为此,您需要一个剖析器动态链接库,以便将您的可执行文件和其他动态链接库链接到该动态链接库。如果该库不存在,您必须为 Tracy 明确创建一个。该库应包含 public/TracyClient.cpp 源文件。将可执行文件和所有要配置文件的动态链接库链接到此动态链接库。
If you are targeting Windows with Microsoft Visual Studio or MinGW, add the TRACY_IMPORTS define to your application.
如果使用 Microsoft Visual Studio 或 MinGW 针对 Windows 操作系统,请在应用程序中添加 TRACY_IMPORTS 定义。
If you are experiencing crashes or freezes when manually loading/unloading a separate DLL with Tracy integration, you might want to try defining both TRACY_DELAYED_INIT and TRACY_MANUAL_LIFETIME macros.
如果在手动加载/卸载与 Tracy 集成的独立动态链接库时遇到崩溃或死机问题,不妨尝试同时定义 TRACY_DELAYED_INIT 和 TRACY_MANUAL_LIFETIME 宏。
TRACY_DELAYED_INIT enables a path where profiler data is gathered into one structure and initialized on the first request rather than statically at the DLL load at the expense of atomic load on each request to the profiler data. TRACY_MANUAL_LIFETIME flag augments this behavior to provide manual StartupProfiler and ShutdownProfiler functions that allow you to create and destroy the profiler data manually. This manual management removes the need to do an atomic load on each call and lets you define an appropriate place to free the resources.
TRACY_DELAYED_INIT 支持将剖析器数据收集到一个结构中,并在第一次请求时初始化,而不是在动态链接库加载时静态初始化,这将导致每次请求都要以原子方式加载剖析器数据。TRACY_MANUAL_LIFETIME 标志增强了这种行为,提供了手动启动剖析器和关闭剖析器函数,允许您手动创建和销毁剖析器数据。这种手动管理消除了在每次调用时进行原子加载的需要,并允许您定义释放资源的适当位置。

Keep everything consistent
一切保持一致

When working with multiple libraries, it is easy to make a mistake and use different sets of feature macros between any two compilation jobs. If you do so, Tracy will not be able to work correctly, and there will be no error or warning messages about the problem. Henceforth, you must make sure each shared object you want to link with, or load uses the same set of macro definitions.
在使用多个库时,很容易出错,在两次编译工作中使用了不同的功能宏集。如果这样做,Tracy 将无法正常工作,也不会出现有关该问题的错误或警告信息。因此,必须确保要链接或加载的每个共享对象都使用同一套宏定义。
Please note that using a prebuilt shared Tracy library, as provided by some package manager or system distribution, also qualifies as using multiple libraries.
请注意,使用某些软件包管理器或系统发行版提供的预编译共享 Tracy 库,也属于使用多个库。

2.1.9 Problematic platforms
2.1.9 有问题的平台

In the case of some programming environments, you may need to take extra steps to ensure Tracy can work correctly.
对于某些编程环境,您可能需要采取额外的步骤来确保 Tracy 能正常工作。

2.1.9.1 Microsoft Visual Studio

If you are using MSVC, you will need to disable the Edit And Continue feature, as it makes the compiler non-conformant to some aspects of the standard. In order to do so, open the project properties and go to General Debug Information Format and make sure Program Database for Edit And Continue (/ZI) is not selected.
如果您使用的是 MSVC,则需要禁用 Edit And Continue 功能,因为它会使编译器不符合 标准的某些方面。为此,请打开项目属性,进入 常规 调试信息格式,并确保未选择用于编辑和继续的程序数据库 (/ZI)。

2.1.9.2 Universal Windows Platform
2.1.9.2 通用视窗平台

Due to a restricted access to Win32 APIs and other sandboxing issues (like network isolation), several limitations apply to using Tracy in a UWP application compared to Windows Desktop:
由于 Win32 API 的访问受限以及其他沙盒问题(如网络隔离),与 Windows 桌面相比,在 UWP 应用程序中使用 Tracy 会受到一些限制:
  • Call stack sampling is not available.
    不提供调用堆栈采样。
  • System profiling is not available.
    系统剖析不可用。
  • To be able to connect from another machine on the local network, the app needs the privateNetworkClientServer capability. To connect from localhost, an active inbound loopback exemption is also necessary .
    要从本地网络上的另一台机器进行连接,应用程序需要具备 privateNetworkClientServer 功能。要从 localhost 进行连接,还需要一个有效的入站环回豁免

2.1.9.3 Apple woes 2.1.9.3 苹果公司的困境

Because Apple has to be think different, there are some problems with using Tracy on OSX and iOS. First, the performance hit due to profiling is higher than on other platforms. Second, some critical features are missing and won't be possible to achieve:
由于苹果公司的想法与众不同,因此在 OSX 和 iOS 上使用 Tracy 会遇到一些问题。首先,与其他平台相比,剖析对性能的影响更大。其次,某些关键功能缺失,无法实现:

- There's no support for the TRACY_NO_EXIT mode.
- 不支持 TRACY_NO_EXIT 模式。

- Profiling is interrupted when the application exits. This will result in missing zones, memory allocations, or even source location names.
- 应用程序退出时,剖析会中断。这将导致丢失区域、内存分配甚至源位置名称。

- OpenGL can't be profiled.
- 无法对 OpenGL 进行剖析。

2.1.9.4 Android lunacy 2.1.9.4 疯狂的安卓系统

Starting with Android 8.0, you are no longer allowed to use the / proc file system. One of the consequences of this change is the inability to check system CPU usage.
从 Android 8.0 开始,不再允许使用 / proc 文件系统。这一变化的后果之一是无法检查系统 CPU 使用率。
This is apparently a security enhancement. Unfortunately, in its infinite wisdom, Google has decided not to give you an option to bypass this restriction.
这显然是为了提高安全性。不幸的是,谷歌以其无限的智慧,决定不提供绕过这一限制的选项。
To workaround this limitation, you will need to have a rooted device. Execute the following commands using root shell:
要解决这一限制,需要 root 设备。使用 root shell 执行以下命令:
setenforce 0
mount -o remount, hidepid=0 /proc
echo -1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
The first command will allow access to system CPU statistics. The second one will enable inspection of foreign processes (required for context switch capture). The third one will lower restrictions on access to performance counters. The last one will allow retrieval of kernel symbol pointers. Be sure that you are fully aware of the consequences of making these changes.
第一条命令允许访问系统 CPU 统计数据。第二条命令允许检查外来进程(上下文切换捕捉需要)。第三条命令将降低对访问性能计数器的限制。最后一条命令允许检索内核符号指针。请确保您完全了解进行这些更改的后果。

2.1.9.5 Virtual machines
2.1.9.5 虚拟机

The best way to run Tracy is on bare metal. Avoid profiling applications in virtualized environments, including services provided in the cloud. Virtualization interferes with the critical facilities needed for the profiler to work, influencing the results you get. Possible problems may vary, depending on the configuration of the VM, and include:
运行 Tracy 的最佳方式是在裸机上运行。避免在虚拟化环境(包括云服务)中对应用程序进行剖析。虚拟化会干扰剖析器工作所需的关键设施,从而影响得到的结果。可能出现的问题视虚拟机的配置而定,包括
  • Reduced precision of time stamps.
    降低时间戳的精度。
  • Inability to obtain precise timestamps, resulting in error messages such as CPU doesn't support RDTSC instruction, or CPU doesn't support invariant TSC. On Windows, you can work this around by rebuilding the profiled application with the TRACY_TIMER_QPC define, which severely lowers the resolution of time readings.
    无法获取精确的时间戳,导致出现 CPU 不支持 RDTSC 指令或 CPU 不支持不变 TSC 等错误信息。在 Windows 系统中,可以通过使用 TRACY_TIMER_QPC 定义重建剖析应用程序来解决这个问题,这样可以大大降低时间读数的分辨率。
  • Frequency of call stack sampling may be reduced.
    可减少调用堆栈采样的频率。
  • Call stack sampling might lack time stamps. While you can use such a reduced data set to perform statistical analysis, you won't be able to limit the time range or see the sampling zones on the timeline.
    调用堆栈采样可能缺少时间戳。虽然您可以使用这种缩小的数据集进行统计分析,但无法限制时间范围,也无法在时间轴上看到采样区。

2.1.9.6 Docker on Linux
2.1.9.6 Linux 上的 Docker

Although the basic features will work without them, you'll have to grant elevated access rights to the container running your client. Here is a sample configuration that may enable the CPU sampling features
虽然不使用这些功能也能实现基本功能,但您必须向运行客户端的容器授予高级访问权限。以下是可启用 CPU 采样功能的示例配置
---privileged --特权
---mount "type=bind,source=/sys/kernel/debug,target=/sys/kernel/debug,readonly"
--挂载 "type=bind,source=/sys/kernel/debug,target=/sys/kernel/debug,只读"
---user  --用户
  • --pid=host --pid=主机
Tested on Ubuntu 22.04.3, docker 24.0.4
已在 Ubuntu 22.04.3、docker 24.0.4 上进行测试

2.1.10 Changing network port
2.1.10 更改网络端口

By default, the client and server communicate on the network using port 8086. The profiling session utilizes the TCP protocol, and the client sends presence announcement broadcasts over UDP.
默认情况下,客户端和服务器使用 8086 端口进行网络通信。剖析会话使用 TCP 协议,客户端通过 UDP 发送存在通知广播。
Suppose for some reason you want to use another port . In that case, you can change it using the TRACY_DATA_PORT macro for the data connection and TRACY_BROADCAST_PORT macro for client broadcasts. Alternatively, you may change both ports at the same time by declaring the TRACY_PORT macro (specific macros listed before have higher priority). You may also change the data connection port without recompiling the client application by setting the TRACY_PORT environment variable.
假设您出于某种原因想使用另一个端口 。在这种情况下,您可以使用 TRACY_DATA_PORT 宏来更改数据连接的端口,使用 TRACY_BROADCAST_PORT 宏来更改客户端广播的端口。或者,您也可以通过声明 TRACY_PORT 宏(前面列出的特定宏具有更高的优先级)来同时更改两个端口。您也可以通过设置 TRACY_PORT 环境变量来更改数据连接端口,而无需重新编译客户端程序。
If a custom port is not specified and the default listening port is already occupied, the profiler will automatically try to listen on a number of other ports.
如果未指定自定义端口,且默认监听端口已被占用,剖析器将自动尝试监听其他端口。

Important 重要

To enable network communication, Tracy needs to open a listening port. Make sure it is not blocked by an overzealous firewall or anti-virus program.
要启用网络通信,Tracy 需要打开一个监听端口。确保该端口没有被过分热心的防火墙或防病毒程序阻塞。

2.1.11 Limitations 2.1.11 限制

When using Tracy Profiler, keep in mind the following requirements:
使用 Tracy Profiler 时,请注意以下要求:
  • The application may use each lock in no more than 64 unique threads.
    应用程序可在不超过 64 个线程中使用每个锁。
  • There can be no more than 65534 unique source locations . This number is further split in half between native code source locations and dynamic source locations (for example, when Lua instrumentation is used).
    唯一的源代码位置不能超过 65534 个。这个数字会被本地代码源代码位置和动态源代码位置(例如,使用 Lua 工具时)一分为二。
  • If there are recursive zones at any point in a zone stack, each unique zone source location should not appear more than 255 times.
    如果在区段堆栈的任何位置存在递归区段,则每个唯一的区段源位置出现的次数不应超过 255 次。
  • Profiling session cannot be longer than 1.6 days ( ). This also includes on-demand sessions.
    剖析会话不能超过 1.6 天 ( )。这也包括按需会话。
  • No more than 4 billion memory free events may be recorded.
    记录的内存空闲事件不得超过 40 亿次
  • No more than 16 million unique call stacks can be captured.
    最多可捕获 1600 万个 唯一的调用堆栈。
The following conditions also need to apply but don't trouble yourself with them too much. You would probably already know if you'd be breaking any.
还需要满足以下条件,但不必过于纠结。您可能已经知道自己是否违反了任何条件。
  • Only little-endian CPUs are supported.
    仅支持小指 CPU。
  • Virtual address space must be limited to 48 bits.
    虚拟地址空间必须限制在 48 位。
  • Tracy server requires CPU which can handle misaligned memory accesses.
    Tracy 服务器需要能处理错位内存访问的 CPU。

2.2 Check your environment
2.2 检查您的环境

It is not an easy task to reliably measure the performance of an application on modern machines. There are many factors affecting program execution characteristics, some of which you will be able to minimize and others you will have to live with. It is critically important that you understand how these variables impact profiling results, as it is key to understanding the data you get.
在现代机器上可靠地测量应用程序的性能并非易事。影响程序执行特性的因素有很多,其中有些因素您可以将其最小化,而有些因素您则不得不接受。了解这些变量如何影响剖析结果至关重要,因为这是理解所获数据的关键。

2.2.1 Operating system 2.2.1 操作系统

In a multitasking operating system, applications compete for system resources with each other. This has a visible effect on the measurements performed by the profiler, which you may or may not accept.
在多任务操作系统中,应用程序会相互竞争系统资源。这对剖析器进行的测量有明显影响,你可以接受,也可以不接受。
To get the most accurate profiling results, you should minimize interference caused by other programs running on the same machine. Before starting a profile session, close all web browsers, music players, instant messengers, and all other non-essential applications like Steam, Uplay, etc. Make sure you don't have the debugger hooked into the profiled program, as it also impacts the timing results.
要获得最准确的剖析结果,应尽量减少同一台机器上运行的其他程序造成的干扰。在开始剖析会话之前,请关闭所有网页浏览器、音乐播放器、即时聊天工具和所有其他非必要的应用程序,如 Steam、Uplay 等。确保不要将调试器连接到剖析程序,因为调试器也会影响计时结果。
Interference caused by other programs can be seen in the profiler if context switch capture (section 3.15.3) is enabled.
如果启用了上下文切换捕捉(第 3.15.3 节),则可以在剖析器中看到其他程序造成的干扰。

Debugger in Visual Studio
Visual Studio 调试器

In MSVC, you would typically run your program using the Start Debugging menu option, which is conveniently available as a F5 shortcut. You should instead use the Start Without Debugging option, available as shortcut.
在 MSVC 中,您通常会使用 "开始调试 "菜单选项运行程序,该选项可作为 F5 快捷方式使用,非常方便。相反,您应该使用 "开始不调试 "选项,该选项可作为 快捷方式使用。

2.2.2 CPU design 2.2.2 中央处理器的设计

Where to even begin here? Modern processors are such complex beasts that it's almost impossible to say anything about how they will behave surely. Cache configuration, prefetcher logic, memory timings, branch predictor, execution unit counts are all the drivers of instructions-per-cycle uplift nowadays after the megahertz race had hit the wall. Not only is it challenging to reason about, but you also need to take into account how the CPU topology affects things, which is described in more detail in section 3.15.4.
从何说起?现代处理器是如此复杂的庞然大物,以至于几乎不可能说清楚它们肯定会如何表现。高速缓存配置、预置器逻辑、内存定时、分支预测器、执行单元数量等,都是在兆赫兹竞赛碰壁之后,如今每周期指令数提升的驱动因素。不仅推理具有挑战性,还需要考虑 CPU 拓扑结构的影响,这将在 3.15.4 节中详细介绍。
Nevertheless, let's look at how we can try to stabilize the profiling data.
不过,让我们来看看如何尽量稳定剖析数据。

2.2.2.1 Superscalar out-of-order speculative execution
2.2.2.1 超标量无序投机执行

Also known as: the spectre thing we have to deal with now.
又称:我们现在必须处理的幽灵事件。
You must be aware that most processors available on the market do not execute machine code linearly, as laid out in the source code. This can lead to counterintuitive timing results reported by Tracy. Trying to get more 'reliable' readings would require a change in the behavior of the code, and this is not a thing a profiler should do. So instead, Tracy shows you what the hardware is really doing.
您必须知道,市场上的大多数处理器 并不是按照源代码中的布局线性执行机器代码的。这会导致 Tracy 报告的时序结果与直觉相反。如果要获得更 "可靠 "的 读数,就需要改变代码的行为,而这并不是剖析器应该做的事情。因此,Tracy 会向你展示硬件的真实运行情况。
This is a complex subject, and the details vary from one CPU to another. You can read a brief rundown of the topic at the following address: https://travisdowns.github.io/blog/2019/06/11/speed-limits.html.
这是一个复杂的问题,不同的 CPU 有不同的细节。您可以在以下地址阅读该主题的简要介绍:https://travisdowns.github.io/blog/2019/06/11/speed-limits.html。

2.2.2.2 Simultaneous multithreading
2.2.2.2 同时多线程运行

Also known as: Hyper-threading. Typically present on Intel and AMD processors.
也称为超线程。通常出现在英特尔和 AMD 处理器上。
To get the most reliable results, you should have all the CPU core resources dedicated to a single thread of your program. Otherwise, you're no longer measuring the behavior of your code but rather how it keeps up when its computing resources are randomly taken away by some other thing running on another pipeline within the same physical core.
为获得最可靠的结果,您应将所有 CPU 内核资源专用于程序的单线程。否则,您测量的就不再是代码的行为,而是当代码的计算资源被同一物理内核中另一条流水线上运行的其他程序随意占用时,代码如何跟上。
Note that you might want to observe this behavior if you plan to deploy your application on a machine with simultaneous multithreading enabled. This would require careful examination of what else is running on the machine, or even how the operating system schedules the threads of your own program, as various combinations of competing workloads (e.g., integer / floating-point operations) will be impacted differently.
请注意,如果您计划在启用了多线程的机器上部署应用程序,可能需要观察这种行为。这就需要仔细检查机器上运行的其他程序,甚至是操作系统如何调度自己程序的线程,因为各种竞争性工作负载组合(如整数/浮点运算)会受到不同的影响。

2.2.2.3 Turbo mode frequency scaling
2.2.2.3 Turbo 模式频率缩放

Also known as: Turbo Boost (Intel), Precision Boost (AMD).
也称为Turbo Boost(英特尔)、Precision Boost(AMD)。
While the CPU is more-or-less designed always to be able to work at the advertised base frequency, there is usually some headroom left, which allows usage of the built-in automatic overclocking. There are no guarantees that the CPU can attain the turbo frequencies or how long it will uphold them, as there are many things to take into consideration:
虽然 CPU 的设计基本都能以标称的基本频率工作,但通常会留有一些余量,以便使用内置的自动超频功能。由于需要考虑的因素很多,因此不能保证 CPU 一定能达到涡轮频率,也不能保证能维持多长时间:
  • How many cores are in use? Just one, or all 8? All 16?
    使用了多少个内核?只有一个,还是全部 8 个?全部 16 个?
  • What type of work is being performed? Integer? Floating-point? 128-wide SIMD? 256-wide SIMD? 512-wide SIMD?
    正在执行什么类型的工作?整数?浮点运算?128 宽 SIMD?256 宽 SIMD?512 宽 SIMD?
  • Were you lucky in the silicon lottery? Some dies are just better made and can achieve higher frequencies.
    你的硅彩票中奖了?有些硅片做工更好,频率更高。
  • Are you running on the best-rated core or at the worst-rated core? Some cores may be unable to match the performance of other cores in the same processor.
    您是在额定性能最好的内核上运行,还是在额定性能最差的内核上运行?某些内核的性能可能无法与同一处理器中的其他内核相媲美。
  • What kind of cooling solution are you using? The cheap one bundled with the CPU or a hefty chunk of metal that has no problem with heat dissipation?
    你使用的是哪种散热解决方案?是 CPU 附带的廉价散热器,还是散热没问题的大块金属散热器?
  • Do you have complete control over the power profile? Spoiler alert: no. The operating system may run anything at any time on any of the other cores, which will impact the turbo frequency you're able to achieve.
    您能完全控制功率曲线吗?剧透:不能。操作系统可能会随时在任何其他内核上运行任何程序,这将影响您所能达到的涡轮频率。
As you can see, this feature basically screams 'unreliable results!' Best keep it disabled and run at the base frequency. Otherwise, your timings won't make much sense. A true example: branchless compression function executing multiple times with the same input data was measured executing at four different speeds.
正如你所看到的,这项功能基本上就是'不可靠的结果!'。最好将其禁用,并以基本频率运行。否则,你的计时就没有什么意义了。一个真实的例子:用相同的输入数据多次执行无分支压缩函数时,测得的执行速度有四种不同。
Keep in mind that even at the base frequency, you may hit the thermal limits of the silicon and be down throttled.
请记住,即使在基本频率下,您也可能会遇到硅片的热极限而被降频。

2.2.2.4 Power saving 2.2.2.4 节电

This is, in essence, the same as turbo mode, but in reverse. While unused, processor cores are kept at lower frequencies (or even wholly disabled) to reduce power usage. When your code starts running , the core frequency needs to ramp up, which may be visible in the measurements.
本质上,这与涡轮增压模式相同,只是反过来而已。在未使用时,处理器内核保持较低频率(甚至完全禁用),以降低功耗。当您的代码开始运行 时,内核频率需要提升,这在测量结果中可以看到。
Even worse, if your code doesn't do a lot of work (for example, because it is waiting for the GPU to finish rendering the frame), the CPU might not ramp up the core frequency to , which will skew the results. Again, to get the best results, keep this feature disabled.
更糟糕的是,如果您的代码没有做很多工作(例如,因为它在等待 GPU 完成帧的呈现),CPU 可能不会将内核频率提升至 ,这将导致结果偏差。同样,要获得最佳效果,请禁用此功能。

2.2.2.5 AVX offset and power licenses
2.2.2.5 AVX 偏移和功率许可

Intel CPUs are unable to run at their advertised frequencies when they perform wide SIMD operations due to increased power requirements . Therefore, depending on the width and type of operations executed, the core operating frequency will be reduced, in some cases quite drastically . To make things even better, some parts of the workload will execute within the available power license, at a twice reduced processing rate. After that, the CPU may be stopped for some time so that the wide parts of executions units can be powered up. Then the work will continue at full processing rate but at a reduced frequency.
由于功耗要求增加,Intel CPU 在执行宽 SIMD 操作时无法以其宣传的频率运行 。因此,根据所执行操作的宽度和类型,内核工作频率将会降低,有时甚至会大幅降低 。更妙的是,工作负载的某些部分将在可用电源许可范围内执行,但处理速度会降低一倍。之后,CPU 可能会停止运行一段时间,以便为执行单元的大部件供电。然后,工作将继续以全速处理,但频率会降低。
Be very careful when using AVX2 or AVX512.
使用 AVX2 或 AVX512 时要非常小心。
More information can be found athttps://travisdowns.github.io/blog/2020/01/17/avxfreq1.html, https://en.wikichip.org/wiki/intel/frequency_behavior.
更多信息,请访问:https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html、https://en.wikichip.org/wiki/intel/frequency_behavior。

2.2.2.6 Summing it up
2.2.2.6 总结

Power management schemes employed in various CPUs make it hard to reason about the true performance of the code. For example, figure 3 contains a histogram of function execution times (as described in chapter 5.7), as measured on an AMD Ryzen CPU. The results ranged from to (extreme outliers were not included on the graph, limiting the longest displayed time to ).
各种 CPU 采用的电源管理方案使我们很难推断代码的真实性能。例如,图 3 包含在 AMD Ryzen CPU 上测量的函数执行时间直方图(如第 5.7 章所述)。结果范围从 (图中未包含极端离群值,将最长显示时间限制为 )。
Figure 3: Example function execution times on a Ryzen CPU
图 3:Ryzen CPU 上的示例函数执行时间
We can immediately see that there are two distinct peaks, at and . A reasonable assumption would be that there are two paths in the code, one that can omit some work, and the second one which must do some additional job. But here's a catch - the measured code is actually branchless and always executes the same way. The two peaks represent two turbo frequencies between which the CPU was aggressively switching.
我们可以立即看到,在 处有两个明显的峰值。一个合理的假设是,代码中有两条路径,一条路径可以省略一些工作,另一条路径必须执行一些额外的工作。但这里有一个问题--测量的代码实际上是无分支的,并且总是以相同的方式执行。两个峰值代表两个涡轮频率,CPU 在这两个频率之间积极切换。
We can also see that the graph gradually falls off to the right (representing longer times), with a slight bump near the end. Again, this can be attributed to running in power-saving mode, with different reaction times to the required operating frequency boost to full power.
我们还可以看到,曲线图逐渐向右下降(代表更长的时间),并在接近尾声时略有上升。同样,这可能是由于在省电模式下运行,对所需的工作频率提升到满功率的反应时间不同。

2.3 Building the server
2.3 构建服务器

Tracy uses the CMake build system. Unlike in most other programs, the root-level CMakeLists.txt file is only used to provide client integration. The build definition files used to create profiler executables are stored in directories specific to each utility.
Tracy 使用 CMake 编译系统。与大多数其他程序不同,根级 CMakeLists.txt 文件仅用于提供客户端集成。用于创建剖析器可执行文件的编译定义文件存储在每个工具的特定目录中。
The easiest way to get going is to build the data analyzer, available in the profiler directory. Then, you can connect to localhost or remote clients and view the collected data right away with it.
最简单的方法是构建数据分析器,该分析器可在 profiler 目录中找到。然后,你就可以连接到本地主机或远程客户端,立即查看收集到的数据。
If you prefer to inspect the data only after a trace has been performed, you may use the command-line utility in the capture directory. It will save a data dump that you may later open in the graphical viewer application.
如果您希望在执行跟踪后才查看数据,可以使用捕获目录中的命令行实用程序。它将保存一份数据转储,随后您可以在图形查看器应用程序中打开。
Ideally, it would be best to use the same version of the Tracy profiler on both client and server. The network protocol may change in-between releases, in which case you won't be able to make a connection.
理想情况下,最好在客户端和服务器上使用相同版本的 Tracy 剖析器。在两个版本之间,网络协议可能会发生变化,在这种情况下,您将无法建立连接。
See section 4 for more information about performing captures.
有关执行捕获的更多信息,请参见第 4 节。

How to use CMake
如何使用 CMake

The CMakeLists.txt file only contains the general definition of how the program should be built. To be able to actually compile the program, you must first create a build directory that takes into account the specific compiler you have on your system, the set of available libraries, the build options you specify, and so on. You can do this by issuing the following command, in this case for the profiler utility:
CMakeLists.txt 文件只包含程序编译的一般定义。要实际编译程序,首先必须创建一个编译目录,其中要考虑到系统中的特定编译器、可用库集、指定的编译选项等。为此,您可以发出以下命令,本例中是针对 profiler 工具:
cmake -B profiler/build -S profiler -DCMAKE_BUILD_TYPE=Release
Now that you have a build directory, you can actually compile the program. For example, you could run the following command:
有了构建目录,就可以编译程序了。例如,你可以运行以下命令:
cmake --build profiler/build --config Release --parallel
The build directory can be reused if you want to compile the program in the future, for example if there have been some updates to the source code, and usually does not need to be regenerated. Note that all build artifacts are contained in the build directory.
如果将来要编译程序,例如源代码有了更新,可以重复使用联编目录,通常不需要重新生成。请注意,所有构建工件都包含在构建目录中。

Important 重要

Due to the memory requirements for data storage, the Tracy server is only supposed to run on 64-bit platforms. While nothing prevents the program from building and executing in a 32-bit environment, doing so is not supported.
由于数据存储需要内存,Tracy 服务器只能在 64 位平台上运行。虽然不妨碍程序在 32 位环境中构建和执行,但不支持这样做。

2.3.1 Required libraries
2.3.1 所需图书馆

The core libraries necessary for the building of Tracy utilities are:
构建特雷西实用程序所需的核心库包括
  • capstone 巅峰之作
  • glfw
  • freetype 自由类型
The capstone library will always be downloaded from GitHub when the CMake build directory is created, unless you have it installed on your system and set the specific build option. You must have git installed for this download to work. Using the capstone library provided by package managers is not recommended, as these packages are typically slow to provide up-to-date versions of the library, and the API may be incompatible.
除非您已在系统中安装了 capstone 库并设置了特定的构建选项,否则在创建 CMake 构建目录时,将始终从 GitHub 下载 capstone 库。您必须安装 git 才能进行下载。不建议使用软件包管理器提供的 capstone 库,因为这些软件包通常无法提供最新版本的库,而且 API 可能不兼容。
It is recommended that you install the glfw and freetype libraries on your system so that Tracy can find them with pkg-config. However, if these libraries are not available, they will be downloaded from GitHub.
建议在系统中安装 glfw 和 freetype 库,这样 Tracy 就能通过 pkg-config 找到它们。不过,如果这些库不可用,也可以从 GitHub 下载。

2.3.1.1 Windows 2.3.1.1 窗口

There is no need to install external libraries (e.g. with vcpkg). All libraries are downloaded automatically by CMake. You still need git, though.
无需安装外部库(如使用 vcpkg)。CMake 会自动下载所有库。不过你仍然需要 git。

2.3.1.2 Unix

On Unix systems (including Linux), you will need to install the pkg-config utility to provide information about libraries.
在 Unix 系统(包括 Linux)上,需要安装 pkg-config 实用程序来提供有关库的信息。
Due to some questionable design decisions by the compiler developers, you will most likely also need the tbb library . If not found, this library is downloaded automatically.
由于编译器开发人员的一些可疑设计决定,您很可能还需要 tbb 库 。如果找不到,系统会自动下载该库。
Installation of the libraries on OSX can be facilitated using the brew package manager.
使用 brew 软件包管理器可以方便地在 OSX 上安装这些库。

2.3.1.3 Linux

There are some Linux-specific libraries that you need to have installed on your system. These won't be downloaded automatically.
您需要在系统中安装一些特定于 Linux 的库。这些库不会自动下载。
For XDG Portal support in the file selector, you need to install the dbus library. If you're one of those weird people who doesn't like modern things, you can install gtk3 instead and force the GTK file selector with a build option.
要在文件选择器中支持 XDG Portal,您需要安装 dbus 库。如果你是那种不喜欢现代事物的怪人,可以安装 gtk3,并通过构建选项强制使用 GTK 文件选择器。
Linux builds of Tracy use the Wayland protocol by default, which allows proper support for Hi-DPI scaling and high-precision input devices such as touchpads. As such, the glfw library is no longer needed, but you will need to install libxkbcommon, wayland, wayland-protocols, libglvnd (or libegl on some distributions).
Tracy 的 Linux 版本默认使用 Wayland 协议,该协议允许适当支持 Hi-DPI 缩放和高精度输入设备(如触摸板)。因此,不再需要 glfw 库,但需要安装 libxkbcommon、wayland、wayland-protocols 和 libglvnd(或某些发行版上的 libegl)。
If you want to use X11 instead, you can enable the LEGACY option in CMake build settings.
如果想使用 X11,可以在 CMake 构建设置中启用 LEGACY 选项。

Linux distributions Linux 发行版

Some Linux distributions require you to add a lib prefix and a -dev or -devel postfix to library names. You may also need to add a seemingly random number to the library name (for example: freetype2, or freetype6).
有些 Linux 发行版要求在库名中添加 lib 前缀和 -dev 或 -devel 后缀。您可能还需要在库名中添加一个看似随机的数字(例如:freetype2 或 freetype6)。
Some Linux distributions ship outdated versions of libraries that are too old for Tracy to build, and do not provide new versions by design. Please reconsider your choice of distribution in this case, as the only function of a Linux distribution is to provide packages, and the one you have chosen is clearly failing at this task.
有些 Linux 发行版提供的库版本已经过时,Tracy 无法编译,而且设计上也不提供新版本。在这种情况下,请重新考虑你选择的发行版,因为 Linux 发行版的唯一功能就是提供软件包,而你选择的发行版显然没有完成这项任务。

Window decorations 窗户装饰

Please don't ask about window decorations in Gnome. The current behavior is the intended behavior. Gnome does not want windows to have decorations, and Tracy respects that choice. If you find this problematic, use a desktop environment that actually listens to its users.
请不要询问 Gnome 中的窗口装饰。目前的行为就是我们想要的行为。Gnome 不希望窗口有装饰,Tracy 也尊重这一选择。如果你觉得这有问题,请使用真正倾听用户心声的桌面环境。

2.3.2 Using an IDE
2.3.2 使用集成开发环境

The recommended development environment is Visual Studio Code . This is a cross-platform solution, so you always get the same experience, no matter what OS you are using.
推荐的开发环境是 Visual Studio Code 。这是一个跨平台解决方案,因此无论你使用什么操作系统,都能获得相同的体验。
VS Code is highly modular, and unlike some other IDEs, it does not come with a compiler. You will need to have one, such as gcc or clang, already installed on your system. On Windows, you should have MSVC 2022 installed in order to have access to its build tools.
VS Code 高度模块化,与其他一些集成开发环境不同,它不自带编译器。您需要在系统中安装一个编译器,如 gcc 或 clang。在 Windows 系统中,您需要安装 MSVC 2022,才能使用其编译工具。
When you open the Tracy directory in VS Code, it will prompt you to install some recommended extensions: clangd, CodeLLDB, and CMake Tools. You should do this if you don't already have them.
在 VS Code 中打开 Tracy 目录时,它会提示您安装一些推荐的扩展:clangd、CodeLLDB 和 CMake Tools。如果您还没有安装,则应安装。
The CMake build configuration will begin immediately. It is likely that you will be prompted to select a development kit to use; for example, you may have a preference as to whether you want to use gcc or clang, and CMake will need to be told about it.
CMake 编译配置将立即开始。很可能会提示您选择要使用的开发工具包;例如,您可能偏好使用 gcc 还是 clang,CMake 需要了解这一点。
After the build configuration phase is over, you may want to make some further adjustments to what is being built. The primary place to do this is in the Project Status section of the CMake side panel. The two key settings there are also available in the status bar at the bottom of the window:
在编译配置阶段结束后,您可能想对正在编译的内容做一些进一步的调整。调整的主要位置在 CMake 侧面板的 "项目状态 "部分。窗口底部的状态栏中也有两个关键设置:
  • The Folder setting allows you to choose which Tracy utility you want to work with. Select "profiler" for the profiler's GUI.
    文件夹设置允许您选择要使用的 Tracy 工具。选择 "profiler"(剖析器)以使用剖析器的图形用户界面。
  • The Build variant setting is used to toggle between the debug and release build configurations.
    构建变量设置用于在调试和发布构建配置之间切换。
With all this taken care of, you can now start the program with the F5 key, set breakpoints, get code completion and navigation , and so on.
完成所有这些操作后,您就可以使用 F5 键启动程序、设置断点、获取代码补全和导航 等。

2.3.3 Embedding the server in profiled application
2.3.3 在剖析应用程序中嵌入服务器

While not officially supported, it is possible to embed the server in your application, the same one running the client part of Tracy. How to make this work is left up for you to figure out.
虽然没有官方支持,但可以在应用程序中嵌入服务器,也就是运行 Tracy 客户端部分的应用程序。至于如何实现这一功能,则需要您自己摸索。
Note that most libraries bundled with Tracy are modified in some way and contained in the tracy namespace. The one exception is Dear ImGui, which can be freely replaced.
请注意,与 Tracy 捆绑的大多数库都以某种方式进行了修改,并包含在 tracy 命名空间中。Dear ImGui 是个例外,它可以被自由替换。
Be aware that while the Tracy client uses its own separate memory allocator, the server part of Tracy will use global memory allocation facilities shared with the rest of your application. This will affect both the memory usage statistics and Tracy memory profiling.
请注意,虽然 Tracy 客户端使用自己独立的内存分配器,但 Tracy 服务器部分将使用与应用程序其他部分共享的全局内存分配设施。这将影响内存使用统计和 Tracy 内存剖析。
The following defines may be of interest:
您可能会对以下定义感兴趣:
  • TRACY_NO_FILESELECTOR - controls whether a system load/save dialog is compiled in. If it's enabled, the saved traces will be named trace.tracy.
    TRACY_NO_FILESELECTOR - 控制是否编译系统加载/保存对话框。如果启用,保存的痕迹将被命名为 trace.tracy。
  • TRACY_NO_STATISTICS - Tracy will perform statistical data collection on the fly, if this macro is not defined. This allows extended trace analysis (for example, you can perform a live search for matching zones) at a small CPU processing cost and a considerable memory usage increase (at least 8 bytes per zone).
    TRACY_NO_STATISTICS - 如果未定义此宏,Tracy 将即时执行统计数据收集。这允许扩展跟踪分析(例如,您可以执行匹配区段的实时搜索),但 CPU 处理成本较低,内存使用量会大幅增加(每个区段至少 8 字节)。
  • TRACY_NO_ROOT_WINDOW - the main profiler view won't occupy the whole window if this macro is defined. Additional setup is required for this to work. If you want to embed the server into your application, you probably should enable this option.
    TRACY_NO_ROOT_WINDOW - 如果定义了该宏,主剖析器视图将不会占据整个窗口。这需要额外的设置。如果您想将服务器嵌入到应用程序中,可能需要启用此选项。

2.3.4 DPI scaling 2.3.4 DPI 缩放

The graphic server application will adapt to the system DPI scaling. If for some reason, this doesn't work in your case, you may try setting the TRACY_DPI_SCALE environment variable to a scale fraction, where a value of 1 indicates no scaling.
图形服务器应用程序将适应系统 DPI 缩放。如果出于某种原因,这在您的情况下不起作用,您可以尝试将 TRACY_DPI_SCALE 环境变量设置为一个缩放分数,值为 1 表示不缩放。

2.4 Naming threads 2.4 命名线程

Remember to set thread names for proper identification of threads. You should do so by using the function tracy::SetThreadName(name) exposed in the public/common/TracySystem.hpp header, as the system facilities typically have limited functionality.
切记设置线程名称,以便正确识别线程。应使用 public/common/TracySystem.hpp 头文件中的函数 tracy::SetThreadName(name)来设置线程名,因为系统设施通常功能有限。
Tracy will try to capture thread names through operating system data if context switch capture is active. However, this is only a fallback mechanism, and it shouldn't be relied upon.
如果上下文切换捕获处于激活状态,Tracy 会尝试通过操作系统数据捕获线程名称。不过,这只是一种备用机制,不应依赖它。

2.4.1 Source location data customization
2.4.1 源位置数据定制

Some source location data such as function name, file path or line number can be overriden with defines TracyFunction, TracyFile, TracyLine made before including public/tracy/Tracy. hpp header file .
在包含 public/tracy/Tracy.hpp 头文件 之前,可以通过定义 TracyFunction、TracyFile、TracyLine 来覆盖某些源代码位置数据,如函数名、文件路径或行号。
#if defined(__clang__) || defined(__GNUC__)
# define TracyFunction __PRETTY_FUNCTION__
#elif defined(_MSC_VER)
# define TracyFunction __FUNCSIG__
#endif
#include <tracy/Tracy.hpp>
. .
void Graphics::Render()

2.5 Crash handling 2.5 碰撞处理

On selected platforms (see section 2.6) Tracy will intercept application crashes . This serves two purposes. First, the client application will be able to send the remaining profiling data to the server. Second, the server will receive a crash report with the crash reason, call stack at the time of the crash, etc.
在选定的平台上(见第 2.6 节),Tracy 将拦截应用程序崩溃 。这样做有两个目的。首先,客户端应用程序将能够向服务器发送剩余的剖析数据。其次,服务器将收到一份崩溃报告,其中包含崩溃原因、崩溃时的调用堆栈等信息。
This is an automatic process, and it doesn't require user interaction. If you are experiencing issues with crash handling you may want to try defining the TRACY_NO_CRASH_HANDLER macro to disable the built in crash handling.
这是一个自动过程,不需要用户交互。如果遇到崩溃处理问题,可以尝试定义 TRACY_NO_CRASH_HANDLER 宏来禁用内置的崩溃处理功能。

Caveats 注意事项

  • On MSVC the debugger has priority over the application in handling exceptions. If you want to finish the profiler data collection with the debugger hooked-up, select the continue option in the debugger pop-up dialog.
    在 MSVC 中,调试器比应用程序优先处理异常。如果要在调试器连接的情况下完成剖析器数据收集,请在弹出的调试器对话框中选择继续选项。
  • On Linux, crashes are handled with signals. Tracy needs to have SIGPWR available, which is rather rarely used. Still, the program you are profiling may expect to employ it for its purposes, which would cause a conflict . To workaround such cases, you may set the TRACY_CRASH_SIGNAL macro value to some other signal (see man 7 signal for a list of signals). Ensure that you avoid conflicts by selecting a signal that the application wouldn't usually receive or emit.
    在 Linux 上,崩溃是通过信号处理的。Tracy 需要有可用的 SIGPWR,但它很少被使用。不过,您正在剖析的程序可能希望使用它来达到自己的目的,这将导致 冲突。为了解决这种情况,您可以将 TRACY_CRASH_SIGNAL 宏值设置为其他信号(信号列表请参阅 man 7 signal)。请确保选择应用程序通常不会接收或发出的信号,以避免冲突。
For example, Mono may use it to trigger garbage collection.
例如,Mono 可以使用它来触发垃圾回收。

2.6 Feature support matrix
2.6 功能支持矩阵

Some features of the profiler are only available on selected platforms. Please refer to table 2 for details.
剖析器的某些功能仅适用于特定平台。详情请参见表 2。

3 Client markup 3 客户标记

With the steps mentioned above, you will be able to connect to the profiled program, but there probably won't be any data collection performed . Unless you're able to perform automatic call stack sampling (see chapter 3.15.5), you will have to instrument the application manually. All the user-facing interface is contained in the public/tracy/Tracy.hpp header file .
通过上述步骤,您将能够连接到剖析程序,但可能不会执行 任何数据收集。除非您能够执行自动调用堆栈采样(参见第 3.15.5 章),否则您必须手动检测应用程序。所有面向用户的界面都包含在 public/tracy/Tracy.hpp 头文件 中。
Manual instrumentation is best started with adding markup to the application's main loop, along with a few functions that the loop calls. Such an approach will give you a rough outline of the function's time cost, which you may then further refine by instrumenting functions deeper in the call stack. Alternatively, automated sampling might guide you more quickly to places of interest.
手动仪表化最好从在应用程序的主循环中添加标记以及循环调用的几个函数开始。这种方法将为您提供函数时间成本的大致轮廓,然后您可以通过对调用堆栈中更深层的函数进行检测来进一步完善该轮廓。另外,自动采样也可以引导您更快地找到感兴趣的地方。

3.1 Handling text strings
3.1 处理文本字符串

When dealing with Tracy macros, you will encounter two ways of providing string data to the profiler. In both cases, you should pass const char* pointers, but there are differences in the expected lifetime of the pointed data.
在处理 Tracy 宏时,您会遇到两种向剖析器提供字符串数据的方法。在这两种情况下,您都应该传递 const char* 指针,但被指向数据的预期寿命有所不同。
Feature 特点 Windows 视窗 Linux 利纳克斯 Android 安卓 OSX iOS BSD QNX
Profiling program init 剖析程序启动
CPU zones CPU 区
Locks 
Plots 地块
Messages 信息
Memory 内存
GPU zones (OpenGL) 图形处理器区域(OpenGL)
GPU zones (Vulkan) GPU 区域(Vulkan)
Call stacks 呼叫堆栈
Symbol resolution 符号分辨率
Crash handling 碰撞处理
CPU usage probing CPU 使用情况探测
Context switches 上下文切换
Wait stacks 等待堆栈
CPU topology information
CPU 拓扑信息
Call stack sampling 调用堆栈取样
Hardware sampling 硬件采样
VSync capture VSync 捕捉
Table 2: Feature support matrix
表 2:特征支持矩阵
  1. When a macro only accepts a pointer (for example: TracyMessageL(text)), the provided string data must be accessible at any time in program execution (this also includes the time after exiting the main function). The string also cannot be changed. This basically means that the only option is to use a string literal (e.g.: TracyMessageL("Hello")).
    当宏只接受一个指针时(例如TracyMessageL(text))时,所提供的字符串数据必须可以在程序执行的任何时候访问(也包括退出主函数后的时间)。字符串也不能更改。这基本上意味着唯一的选择是使用字符串字面形式(例如TracyMessageL("Hello"))。
  2. If there's a string pointer with a size parameter (for example TracyMessage(text, size)), the profiler will allocate a temporary internal buffer to store the data. The size count should not include the terminating null character, using strlen (text) is fine. The pointed-to data is not used afterward. Remember that allocating and copying memory involved in this operation has a small time cost.
    如果有一个带大小参数的字符串指针(例如 TracyMessage(text,size)),剖析器将分配一个临时内部缓冲区来存储数据。大小计数不应包括结束符空字符,使用 strlen (text) 即可。之后,指向的数据不会被使用。请记住,在此操作中分配和复制内存的时间成本很小。
Be aware that every single instance of text string data passed to the profiler can't be larger than 64 KB .
请注意,传给剖析器的每个文本字符串数据实例都不能大于 64 KB。

3.1.1 Program data lifetime
3.1.1 程序数据寿命

Take extra care to consider the lifetime of program code (which includes string literals) in your application. For example, if you dynamically add and remove modules (i.e., DLLs, shared objects) during the runtime, text data will only be present when the module is loaded. Additionally, when a module is unloaded, the operating system can place another one in its space in the process memory map, resulting in the aliasing of text strings. This leads to all sorts of confusion and potential crashes.
要特别注意考虑应用程序中程序代码(包括字符串字面量)的生命周期。例如,如果在运行期间动态添加和删除模块(即 DLL、共享对象),文本数据将仅在加载模块时存在。此外,当一个模块被卸载时,操作系统会在进程内存映射的空间中放置另一个模块,从而导致文本字符串的别名。这会导致各种混乱和潜在的崩溃。
Note that string literals are the only option in many parts of the Tracy API. For example, look at how frame or plot names are specified. You cannot unload modules that contain string literals that you passed to the profiler .
请注意,在 Tracy API 的许多部分中,字符串字面量是唯一的选项。例如,看看如何指定帧或绘图名称。您不能卸载包含字符串字面量的模块,这些字符串字面量是您传递给剖析器 的。

3.1.2 Unique pointers 3.1.2 唯一指针

In some cases marked in the manual, Tracy expects you to provide a unique pointer in each occurrence the same string literal is used. This can be exemplified in the following listing:
在手册中标注的某些情况下,Tracy 希望您在每次使用相同字符串字面量时都提供一个唯一的指针。下面的列表就是一个例子:
FrameMarkStart("Audio processing");

FrameMarkEnd("Audio processing");
Here, we pass two string literals with identical contents to two different macros. It is entirely up to the compiler to decide if it will pool these two strings into one pointer or if there will be two instances present in the executable image . For example, on MSVC, this is controlled by Configuration Properties Code Generation Enable String Pooling option in the project properties (optimized builds enable it automatically). Note that even if string pooling is used on the compilation unit level, it is still up to the linker to implement pooling across object files.
在这里,我们将两个内容完全相同的字符串字面量传递给了两个不同的宏。是将这两个字符串汇集到一个指针中,还是在可执行映像 中出现两个实例,这完全取决于编译器。例如,在 MSVC 上,这由项目属性中的配置属性 代码生成 启用字符串池选项控制(优化后的编译会自动启用)。请注意,即使在编译单元级别使用了字符串池,链接器仍需在对象文件中实施字符串池。
As you can see, making sure that string literals are properly pooled can be surprisingly tricky. To work around this problem, you may employ the following technique. In one source file create the unique pointer for a string literal, for example:
正如你所看到的,确保字符串字面量被正确池化是一件非常棘手的事情。要解决这个问题,可以采用以下方法。例如,在一个源文件中为字符串字面量创建唯一指针:
const char* const sl_AudioProcessing = "Audio processing";
const char* const sl_AudioProcessing = "音频处理";
Then in each file where you want to use the literal, use the variable name instead. Notice that if you'd like to change a name passed to Tracy, you'd need to do it only in one place with such an approach.
然后,在每个需要使用字面意义的文件中,使用变量名来代替。需要注意的是,如果要更改传给 Tracy 的名称,只能在一个地方使用这种方法。
extern const char* const sl_AudioProcessing;
FrameMarkStart(sl_AudioProcessing);
FrameMarkEnd(sl_AudioProcessing);
In some cases, you may want to have semi-dynamic strings. For example, you may want to enumerate workers but don't know how many will be used. You can handle this by allocating a never-freed char buffer, which you can then propagate where it's needed. For example:
在某些情况下,您可能希望使用半动态字符串。例如,您可能想枚举 Worker,但不知道会使用多少。您可以通过分配一个永不释放的字符缓冲区来处理这种情况,然后再将其传播到需要的地方。例如
char* workerId = new char [16]
snprintf(workerId, 16, "Worker %i", id);
. .
FrameMarkStart(workerId);
You have to make sure it's initialized only once, before passing it to any Tracy API, that it is not overwritten by new data, etc. In the end, this is just a pointer to character-string data. It doesn't matter if the memory was loaded from the program image or allocated on the heap.
在将其传递给 Tracy API 之前,必须确保只初始化一次,而且不会被新数据覆盖,等等。归根结底,这只是一个指向字符串数据的指针。至于内存是从程序映像中加载的,还是在堆上分配的,都无关紧要。

3.2 Specifying colors 3.2 指定颜色

In some cases, you will want to provide your own colors to be displayed by the profiler. You should use a hexadecimal 0xRRGGBB notation in all such places.
在某些情况下,您需要提供自己的颜色供剖析器显示。在所有这些地方,您都应该使用十六进制 0xRRGGBB 符号。
Alternatively you may use named colors predefined in common/TracyColor. hpp (included by Tracy. hpp). Visual reference: https://en.wikipedia.org/wiki/X11_color_names.
您也可以使用 common/TracyColor. hpp 中预定义的命名颜色(由 Tracy.)可视化参考:https://en.wikipedia.org/wiki/X11_color_names.
Do not use if you want to specify black color, as zero is a special value indicating that no color was set. Instead, use a value close to zero, e.g. .
如果要指定黑色,请不要使用 ,因为零是一个特殊值,表示没有设置颜色。相反,请使用接近零的值,例如

3.3 Marking frames 3.3 标记框

To slice the program's execution recording into frame-sized chunks , put the FrameMark macro after you have completed rendering the frame. Ideally, that would be right after the swap buffers command.
要将程序的执行记录分割成帧大小的 块,请在完成帧的渲染后使用 FrameMark 宏。理想情况下,应紧接在交换缓冲区命令之后。

Do I need this?
我需要这个吗?

This step is optional, as some applications do not use the concept of a frame.
这一步是可选的,因为有些应用程序不使用框架的概念。

3.3.1 Secondary frame sets
3.3.1 二级框架组

In some cases, you may want to track more than one set of frames in your program. To do so, you may use the FrameMarkNamed (name) macro, which will create a new set of frames for each unique name you provide. But, first, make sure you are correctly pooling the passed string literal, as described in section 3.1.2.
在某些情况下,您可能希望在程序中跟踪不止一组帧。为此,您可以使用 FrameMarkNamed (name) 宏,它将为您提供的每个唯一名称创建一组新的帧。但首先,请确保您已按照第 3.1.2 节所述,正确地汇集了传递的字符串字面量。

3.3.2 Discontinuous frames
3.3.2 不连续帧

Some types of frames are discontinuous by their nature - they are executed periodically, with a pause between each run. Examples of such frames are a physics processing step in a game loop or an audio callback running on a separate thread. Tracy can also track this kind of frames.
某些类型的帧在本质上是不连续的--它们周期性地执行,每次运行之间都会暂停。例如,游戏循环中的物理处理步骤或在单独线程上运行的音频回调。Tracy 也可以跟踪这类帧。
To mark the beginning of a discontinuous frame use the FrameMarkStart (name) macro. After the work is finished, use the FrameMarkEnd (name) macro.
要标记不连续帧的开始,请使用 FrameMarkStart(名称)宏。工作完成后,使用 FrameMarkEnd(名称)宏。
Important 重要
  • Frame types must not be mixed. For each frame set, identified by an unique name, use either continuous or discontinuous frames only!
    帧类型不得混用。每个帧集都有一个唯一的名称,只能使用连续帧或非连续帧!
  • You must issue the FrameMarkStart and FrameMarkEnd macros in proper order. Be extra careful, especially if multi-threading is involved.
    您必须按正确顺序发出 FrameMarkStart 和 FrameMarkEnd 宏。尤其是涉及多线程时,要格外小心。
  • String literals passed as frame names must be properly pooled, as described in section 3.1.2.
    如第 3.1.2 节所述,必须对作为帧名传递的字符串字面量进行适当的池化处理。

3.3.3 Frame images 3.3.3 画框图像

It is possible to attach a screen capture of your application to any frame in the main frame set. This can help you see the context of what's happening in various places in the trace. You need to implement retrieval of the image data from GPU by yourself.
可以将应用程序的屏幕截图附加到主帧集的任何帧上。这可以帮助您查看跟踪中各处发生的情况。您需要自行从 GPU 获取图像数据。
Images are sent using the FrameImage(image, width, height, offset, flip) macro, where image is a pointer to RGBA pixel data, width and height are the image dimensions, which must be divisible by 4, off set specifies how much frame lag was there for the current image (see chapter 3.3.3.1), and flip should be set, if the graphics API stores images upside-down . The profiler copies the image data, so you don't need to retain it.
图像使用 FrameImage(image, width, height, offset, flip) 宏发送,其中 image 是指向 RGBA 像素数据的指针,width 和 height 是图像尺寸,必须能被 4 整除,off set 指定当前图像有多少帧滞后(请参阅第 3.3.3.1 章),如果图形 API 上下颠倒 存储图像,则应设置 flip。剖析器会复制图像数据,因此无需保留。
Handling image data requires a lot of memory and bandwidth . To achieve sane memory usage, you should scale down taken screenshots to a suitable size, e.g., .
处理图像数据需要大量内存和带宽 。要实现合理的内存使用,应将截取的屏幕截图缩放到合适的大小,例如
To further reduce image data size, frame images are internally compressed using the DXT1 Texture Compression technique , which significantly reduces data size , at a slight quality decrease. The compression algorithm is high-speed and can be made even faster by enabling SIMD processing, as indicated in table 3 .
为了进一步减小图像数据大小,帧图像在内部使用 DXT1 纹理压缩技术 进行压缩,从而显著减小数据大小 ,但质量略有下降。如表 3 所示,该压缩算法是高速的,启用 SIMD 处理后速度会更快。
Implementation 实施情况 Required define 必须定义 Time 时间
x86 Reference x86 参考
x86 S SSE4
AAV: AAV:
ARM 1.04 ms 1.04 毫秒
ARM32 _ARM_NEON
ARM64 NEON __ARM_NEON
a) VEX encoding; b) ARM32 NEON code compiled for ARM64
a) VEX 编码;b) 为 ARM64 编译的 ARM32 NEON 代码
Table 3: Client compression time of image. x86: Ryzen 9 3900X (MSVC); ARM: ODROID-C2 (gcc).
表 3: 图像的客户端压缩时间:Ryzen 9 3900X (MSVC); ARM:ODROID-C2 (gcc)。

Caveats 注意事项

  • Frame images are compressed on a second client profiler thread , to reduce memory usage of queued images. This might have an impact on the performance of the profiled application.
    帧图像在第二个客户端剖析器线程 上进行压缩,以减少排队图像的内存使用量。这可能会影响剖析应用程序的性能。
  • This second thread will be periodically woken up, even if there are no frame images to compress . If you are not using the frame image capture functionality and you don't wish this thread to be running, you can define the TRACY_NO_FRAME_IMAGE macro.
    即使没有帧图像需要压缩 ,第二个线程也会定期被唤醒。如果不使用帧图像捕获功能,也不希望运行该线程,则可以定义 TRACY_NO_FRAME_IMAGE 宏。
  • Due to implementation details of the network buffer, a single frame image cannot be greater than 256 KB after compression. Note that a image fits in this limit.
    由于网络缓冲区的实施细节,压缩后的单帧图像不能大于 256 KB。请注意, 图像符合这一限制。
Small part of compression task is offloaded to the server
小部分压缩任务被卸载到服务器上
This way of doing things is required to prevent a deadlock in specific circumstances.
在特定情况下,为了防止出现僵局,需要采用这种方法。

3.3.3.1 OpenGL screen capture code example
3.3.3.1 OpenGL 屏幕捕捉代码示例

There are many pitfalls associated with efficiently retrieving screen content. For example, using glReadPixels and then resizing the image using some library is terrible for performance, as it forces synchronization of the GPU to CPU and performs the downscaling in software. To do things properly, we need to scale the image using the graphics hardware and transfer data asynchronously, which allows the GPU to run independently of the CPU.
高效检索屏幕内容存在许多隐患。例如,使用 glReadPixels 然后使用某些库来调整图像大小对性能来说是非常糟糕的,因为这样做会强制 GPU 与 CPU 同步,并在软件中执行缩放。若要正确操作,我们需要使用图形硬件缩放图像,并异步传输数据,这样 GPU 就能独立于 CPU 运行。
The following example shows how this can be achieved using OpenGL 3.2. Of course, more recent OpenGL versions allow doing things even better (for example, using persistent buffer mapping), but this manual won't cover it here.
下面的示例展示了如何使用 OpenGL 3.2 实现这一功能。当然,最新的 OpenGL 版本可以做得更好(例如,使用持久缓冲区映射),但本手册不会在此介绍。
Let's begin by defining the required objects. First, we need a texture to store the resized image, a framebuffer object to be able to write to the texture, a pixel buffer object to store the image data for access by the CPU, and a fence to know when the data is ready for retrieval. We need everything in at least three copies (we'll use four) because the rendering, as seen in the program, can run ahead of the GPU by a couple of frames. Next, we need an index to access the appropriate data set in a ring-buffer manner. And finally, we need a queue to store indices to data sets that we are still waiting for.
首先,让我们定义所需的对象。首先,我们需要一个纹理来存储调整过大小的图像,一个帧缓冲器对象来写入纹理,一个像素缓冲器对象来存储图像数据供 CPU 访问,以及一个栅栏来了解数据何时可以检索。我们需要将所有内容至少拷贝三份(我们将使用四份),因为在程序中可以看到,渲染会比 GPU 提前几帧运行。接下来,我们需要一个索引,以环形缓冲区的方式访问相应的数据集。最后,我们需要一个队列来存储我们仍在等待的数据集的索引。
GLuint m_fiTexture[4];
GLuint m_fiFramebuffer [4];
GLuint m_fiPbo[4];
GLsync m_fiFence [4];
int m_fiIdx = 0;
std::vector<int> m_fiQueue;
Everything needs to be correctly initialized (the cleanup is left for the reader to figure out).
一切都需要正确初始化(清理工作留给读者自己解决)。
glGenTextures(4, m_fiTexture);
glGenFramebuffers(4, m_fiFramebuffer);
glGenBuffers(4, m_fiPbo);
for(int i=0; i<4; i++)
{
    glBindTexture(GL_TEXTURE_2D, m_fiTexture[i]);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 320, 180, 0, GL_RGBA, GL_UNSIGNED_BYTE,
        nullptr);
    glBindFramebuffer(GL_FRAMEBUFFER, m_fiFramebuffer[i]);
    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENTO, GL_TEXTURE_2D,
        m_fiTexture[i], 0);
    glBindBuffer(GL_PIXEL_PACK_BUFFER, m_fiPbo[i]);
    glBufferData(GL_PIXEL_PACK_BUFFER, 320*180*4, nullptr, GL_STREAM_READ);
}
We will now set up a screen capture, which will downscale the screen contents to pixels and copy the resulting image to a buffer accessible by the CPU when the operation is done. This should be placed right before swap buffers or present call.
现在,我们将设置屏幕捕获,将屏幕内容缩放到 像素,并在操作完成后将生成的图像复制到 CPU 可访问的缓冲区。该操作应放在交换缓冲区或当前调用之前。
assert(m_fiQueue.empty() || m_fiQueue.front() != m_fiIdx); // check for buffer overrun
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, m_fiFramebuffer[m_fiIdx]);
glBlitFramebuffer(0, 0, res.x, res.y, 0, 0, 320, 180, GL_COLOR_BUFFER_BIT, GL_LINEAR);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
glBindFramebuffer(GL_READ_FRAMEBUFFER, m_fiFramebuffer[m_fiIdx]);
glBindBuffer(GL_PIXEL_PACK_BUFFER, m_fiPbo[m_fiIdx]);
glReadPixels(0, 0, 320, 180, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glBindFramebuffer(GL_READ_FRAMEBUFFER, 0);
m_fiFence[m_fiIdx] = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
m_fiQueue.emplace_back(m_fiIdx);
m_fiIdx = (m_fiIdx + 1) % 4;
And lastly, just before the capture setup code that was just added we need to have the image retrieval code. We are checking if the capture operation has finished. If it has, we map the pixel buffer object to memory, inform the profiler that there are image data to be handled, unmap the buffer and go to check the next queue item. If capture is still pending, we break out of the loop. We will have to wait until the next frame to check if the GPU has finished performing the capture.
最后,在刚刚添加的捕获设置代码 之前,我们需要有图像检索代码。我们要检查捕获操作是否已完成。如果已经完成,我们会将像素缓冲区对象映射到内存中,通知剖析器有图像数据需要处理,然后取消映射缓冲区,并检查下一个队列项目。如果捕捉仍未结束,我们就跳出循环。我们必须等到下一帧才能检查 GPU 是否已完成捕捉。
while(!m_fiQueue.empty())
{
    const auto fiIdx = m_fiQueue.front();
    if(glClientWaitSync(m_fiFence[fiIdx], 0, 0) == GL_TIMEOUT_EXPIRED) break;
    glDeleteSync(m_fiFence[fiIdx]);
    glBindBuffer(GL_PIXEL_PACK_BUFFER, m_fiPbo[fiIdx]);
    auto ptr = glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, 320*180*4, GL_MAP_READ_BIT);
    FrameImage(ptr, 320, 180, m_fiQueue.size(), true);
    glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
    m_fiQueue.erase(m_fiQueue.begin());
}
Notice that in the call to FrameImage we are passing the remaining queue size as the offset parameter. Queue size represents how many frames ahead our program is relative to the GPU. Since we are sending past frame images, we need to specify how many frames behind the images are. Of course, if this would be synchronous capture (without the use of fences and with retrieval code after the capture setup), we would set offset to zero, as there would be no frame lag.
请注意,在调用 FrameImage 时,我们将剩余的队列大小作为偏移参数传递。队列大小表示我们的程序相对于 GPU 超前多少帧。由于我们发送的是过去帧的图像,因此需要指定图像的帧数。当然,如果是同步捕捉(不使用栅栏,并在捕捉设置后使用检索代码),我们会将偏移量设置为零,因为不会出现帧滞后。
High quality capture The code above uses glBlitFramebuffer function, which can only use nearest neighbor filtering. The use of such filtering can result in low-quality screenshots, as shown in figure 4 . However, with a bit more work, it is possible to obtain nicer-looking screenshots, as presented in figure 5 . Unfortunately, you will need to set up a complete rendering pipeline for this to work.
高质量截图 上述代码使用了 glBlitFramebuffer 函数,该函数只能使用近邻过滤。如图 4 所示,使用这种过滤会导致截图质量较低。不过,只要多花点功夫,就能获得更漂亮的截图,如图 5 所示。遗憾的是,你需要建立一个完整的渲染管道才能实现这一点。
First, you need to allocate an additional set of intermediate frame buffers and textures, sized the same as the screen. These new textures should have a minification filter set to GL_LINEAR_MIPMAP_LINEAR. You will also need to set up everything needed to render a full-screen quad: a simple texturing shader and vertex buffer with appropriate data. Since you will use this vertex buffer to render to the scaled-down frame buffer, you may prepare its contents beforehand and update it only when the aspect ratio changes.
首先,您需要分配一组额外的中间帧缓冲区和纹理,大小与屏幕相同。这些新纹理的最小化过滤器应设置为 GL_LINEAR_MIPMAP_LINEAR。您还需要设置渲染全屏四边形所需的一切:一个简单的纹理着色器和带有适当数据的顶点缓冲区。由于您将使用该顶点缓冲区来渲染缩小的帧缓冲区,因此可以事先准备好其内容,仅在纵横比发生变化时才更新。
With all this done, you can perform the screen capture as follows:
完成所有这些操作后,您就可以执行屏幕截图了,具体步骤如下:
  • Setup vertex buffer configuration for the full-screen quad buffer (you only need position and uv coordinates).
    为全屏四维缓冲区设置顶点缓冲区配置(只需位置和 uv 坐标)。
  • Blit the screen contents to the full-sized frame buffer.
    将屏幕内容闪存到全尺寸帧缓冲区。
  • Bind the texture backing the full-sized frame buffer.
    将纹理绑定到全尺寸帧缓冲区。
  • Generate mipmaps using glGenerateMipmap.
    使用 glGenerateMipmap 生成 mipmap。
  • Set viewport to represent the scaled-down image size.
    设置视口以表示缩小后的图像尺寸。
  • Bind vertex buffer data, shader, setup the required uniforms.
    绑定顶点缓冲区数据、着色器并设置所需的制服。
  • Draw full-screen quad to the scaled-down frame buffer.
    在缩小的帧缓冲区中绘制全屏四边形。
  • Retrieve frame buffer contents, as in the code above.
    读取帧缓冲区内容,如上代码所示。
  • Restore viewport, vertex buffer configuration, bound textures, etc.
    恢复视口、顶点缓冲区配置、绑定纹理等。
While this approach is much more complex than the previously discussed one, the resulting image quality increase makes it worthwhile.
虽然这种方法比之前讨论的方法复杂得多,但由此带来的图像质量提升使其物有所值。
Figure 4: Low-quality screen shot
图 4:低质量屏幕截图
Figure 5: High-quality screen shot
图 5:高质量屏幕截图
You can see the performance results you may expect in a simple application in table 4 . The naïve capture performs synchronous retrieval of full-screen image and resizes it using stb_image_resize. The proper and high-quality captures do things as described in this chapter.
您可以在表 4 中看到一个简单应用程序的性能结果。简单的捕获会同步检索全屏图像,并使用 stb_image_resize 调整图像大小。正确的捕获和高质量的捕获将按照本章所述执行。
Resolution 决议 Naïve capture 天真的捕捉 Proper capture 正确捕捉 High quality 高质量
80 FPS 4200 FPS 2800 FPS
23 FPS 3300 FPS 1600 FPS
Table 4: Frame capture efficiency
表 4:帧捕获效率

3.4 Marking zones 3.4 标识区

To record a zone's execution time add the ZoneScoped macro at the beginning of the scope you want to measure. This will automatically record function name, source file name, and location. Optionally you may use the ZoneScopedC(color) macro to set a custom color for the zone. Note that the color value will be constant in the recording (don't try to parametrize it). You may also set a custom name for the zone, using the ZoneScopedN (name) macro. Color and name may be combined by using the ZoneScopedNC(name, color) macro.
要记录一个区的 执行时间,请在要测量的作用域开头添加 ZoneScoped 宏。这将自动记录函数名称、源文件名称和位置。您可以选择使用 ZoneScopedC(color) 宏为区域设置自定义颜色。请注意,颜色值将在记录中保持不变(不要尝试参数化)。您还可以使用 ZoneScopedN (name) 宏为区段设置自定义名称。颜色和名称可通过 ZoneScopedNC(name, color) 宏组合使用。
Use the ZoneText (text, size) macro to add a custom text string that the profiler will display along with the zone information (for example, name of the file you are opening). Multiple text strings can be attached to any single zone. The dynamic color of a zone can be specified with the ZoneColor(uint32_t) macro to override the source location color. If you want to send a numeric value and don't want to pay the cost of converting it to a string, you may use the ZoneValue (uint64_t) macro. Finally, you can check if the current zone is active with the ZoneIsActive macro.
使用 ZoneText(文本,大小)宏添加自定义文本字符串,剖析器将与区段信息(例如,正在打开的文件名)一起显示。任何一个区段都可以附加多个文本字符串。可以使用 ZoneColor(uint32_t) 宏指定区域的动态颜色,以覆盖源位置颜色。如果要发送数值,但又不想支付将其转换为字符串的费用,则可以使用 ZoneValue (uint64_t) 宏。最后,可以使用 ZoneIsActive 宏检查当前区域是否处于活动状态。
If you want to set zone name on a per-call basis, you may do so using the ZoneName (text, size) macro. However, this name won't be used in the process of grouping the zones for statistical purposes (sections 5.6 and 5.7)
如果要按呼叫设置区段名称,可以使用 ZoneName(文本,大小)宏。但是,在出于统计目的对区段进行分组的过程中,将不会使用该名称(第 5.6 和 5.7 节)
To use printf-like formatting, you can use the ZoneTextF (fmt, . . .) and ZoneNameF (fmt, . . .) macros.
要使用类似 printf 的格式,可以使用 ZoneTextF (fmt, . .) 和 ZoneNameF (fmt, . .) 宏。

Important 重要

Zones are identified using static data structures embedded in program code. Therefore, you need to consider the lifetime of code in your application, as discussed in section 3.1.1, to make sure that the profiler can access this data at any time during the program lifetime.
使用嵌入程序代码中的静态数据结构来识别区域。因此,您需要考虑应用程序中代码的生命周期(如第 3.1.1 节所述),以确保剖析器可以在程序生命周期内的任何时间访问这些数据。
If you can't fulfill this requirement, you must use transient zones, described in section 3.4.4.
如果无法满足这一要求,则必须使用暂存区,详见第 3.4.4 节。

3.4.1 Manual management of zone scope
3.4.1 人工管理区域范围

The zone markup macros automatically report when they end, through the RAII mechanism . This is very helpful, but sometimes you may want to mark the zone start and end points yourself, for example, if you want to have a zone that crosses the function's boundary. You can achieve this by using the C API, which is described in section 3.13
区域标记宏会通过 RAII 机制 自动报告结束时间。这非常有用,但有时您可能希望自己标记区域的起点和终点,例如,如果您希望区域跨越函数的边界。您可以使用第 3.13 节中介绍的 C API 来实现这一目的

3.4.2 Multiple zones in one scope
3.4.2 一个范围内的多个区

Using the ZoneScoped family of macros creates a stack variable named ___tracy_scoped_zone. If you want to measure more than one zone in the same scope, you will need to use the ZoneNamed macros, which require that you provide a name for the created variable. For example, instead of ZoneScopedN ("Zone name"), you would use ZoneNamedN(variableName, "Zone name", true) .
使用 ZoneScoped 系列宏会创建一个名为 ___tracy_scoped_zone 的堆栈变量。如果要在同一作用域内测量多个区,则需要使用 ZoneNamed 宏,它要求为创建的变量提供一个名称。例如,您可以使用 ZoneNamedN(variableName, "Zone name", true) 来代替 ZoneScopedN("区段名称")。
The ZoneText, ZoneColor, ZoneValue, ZoneIsActive, and ZoneName macros apply to the zones created using the ZoneScoped macros. For zones created using the ZoneNamed macros, you can use the ZoneTextV(variableName, text, size),ZoneColorV(variableName, uint32_t), ZoneValueV(variableName, uint64_t), ZoneIsActiveV (variableName), or ZoneNameV (variableName, text, size) macros, or invoke the methods Text, Color, Value, IsActive, or Name directly on the variable you have created.
ZoneText、ZoneColor、ZoneValue、ZoneIsActive 和 ZoneName 宏适用于使用 ZoneScoped 宏创建的区段。对于使用 ZoneNamed 宏创建的区域,可以使用 ZoneTextV(变量名,文本,大小)、ZoneColorV(变量名,uint32_t)、ZoneValueV(变量名,uint64_t)、ZoneIsActiveV(变量名)或 ZoneNameV(变量名,文本,大小)宏,或直接在创建的变量上调用 Text、Color、Value、IsActive 或 Name 方法。
Zone objects can't be moved or copied.
区域对象不能移动或复制。

Zone stack 区域堆栈

The ZoneScoped macros are imposing the creation and usage of an implicit zone stack. You must also follow the rules of this stack when using the named macros, which give you some more leeway in doing things. For example, you can only set the text for the zone which is on top of the stack, as you only could do with the ZoneText macro. It doesn't matter that you can call the Text method of a non-top zone which is accessible through a variable. Take a look at the following code:
ZoneScoped 宏要求创建和使用隐式区堆栈。在使用命名宏时,您也必须遵守该堆栈的规则,这样您就有了更多的操作余地。例如,您只能为堆栈顶层的区段设置文本,而只能使用 ZoneText 宏。你可以调用通过变量访问的非顶层区域的 Text 方法,但这并不重要。请看下面的代码:
{
ZoneNamed(Zone1, true); ZoneNamed(Zone1, true);
(a)
ZoneNamed(Zone2, true)
(b)
}
(c)
}
It is valid to set the Zone1 text or name only in places (a) or (c). After Zone2 is created at b) you can no longer perform operations on Zone1, until Zone2 is destroyed.
只有在(a)或(c)处设置 Zone1 文本或名称才有效。在 b) 处创建 Zone2 后,就不能再对 Zone1 执行操作,直到 Zone2 销毁为止。

3.4.3 Filtering zones 3.4.3 过滤区域

Zone logging can be disabled on a per-zone basis by making use of the ZoneNamed macros. Each of the macros takes an active argument ('true' in the example in section 3.4.2), which will determine whether the zone should be logged.
通过使用 ZoneNamed 宏,可按区禁用区日志记录。每个宏都有一个活动参数(在第 3.4.2 节的示例中为 "true"),用于决定是否记录区域日志。
Note that this parameter may be a run-time variable, such as a user-controlled switch to enable profiling of a specific part of code only when required.
请注意,该参数可能是一个运行时变量,例如用户控制的开关,用于仅在需要时才对代码的特定部分进行剖析。
If the condition is constant at compile-time, the resulting code will not contain a branch (the profiling code will either be always enabled or won't be there at all). The following listing presents how you might implement profiling of specific application subsystems:
如果编译时条件不变,生成的代码将不包含分支(剖析代码要么始终启用,要么根本不存在)。下面列出了如何对特定应用程序子系统进行剖析:
enum SubSystems
{
    Sys_Physics = 1 << 0,
    Sys_Rendering = 1<< 1,
    Sys_NasalDemons = 1 << 2
}
...
// Preferably a define in the build system
#define SUBSYSTEMS (Sys_Physics | Sys_NasalDemons)
. . .
void Physics::Process()
{
    ZoneNamed( __tracy, SUBSYSTEMS & Sys_Physics ); // always true, no runtime cost
    . . 
}
void Graphics::Render()
{
    ZoneNamed( __tracy, SUBSYSTEMS & Sys_Graphics ); // always false, no runtime
        cost
    ..

3.4.4 Transient zones 3.4.4 瞬变区

In order to prevent problems caused by unloadable code, described in section 3.1.1, transient zones copy the source location data to an on-heap buffer. This way, the requirement on the string literal data being accessible for the rest of the program lifetime is relaxed, at the cost of increased memory usage.
为了防止 3.1.1 节所述的可卸载代码引起的问题,暂存区会将源代码位置数据复制到堆上缓冲区。这样就放宽了对字符串字面数据在程序剩余生命周期内均可访问的要求,但代价是增加了内存使用量。
Transient zones can be declared through the ZoneTransient and ZoneTransientN macros, with the same set of parameters as the ZoneNamed macros. See section 3.4.2 for details and make sure that you observe the requirements outlined there.
可通过 ZoneTransient 和 ZoneTransientN 宏声明瞬态区域,其参数集与 ZoneNamed 宏相同。详情请参阅第 3.4.2 节,并确保遵守其中列出的要求。

3.4.5 Variable shadowing
3.4.5 可变阴影

The following code is fully compliant with the C++ standard:
以下代码完全符合 C++ 标准:
void Function()
{
    ZoneScoped;
    ..
    for(int i=0; i<10; i++)
    {
        ZoneScoped;
        ...
    }
}
This doesn't stop some compilers from dispensing fashion advice about variable shadowing (as both ZoneScoped calls create a variable with the same name, with the inner scope one shadowing the one in the outer scope). If you want to avoid these warnings, you will also need to use the ZoneNamed macros.
但这并不能阻止某些编译器发出关于变量阴影的时尚警告(因为两个 ZoneScoped 调用都会创建一个同名变量,内作用域的变量会对外层作用域的变量产生阴影)。如果想避免这些警告,还需要使用 ZoneNamed 宏。

3.4.6 Exiting program from within a zone
3.4.6 从区段内退出程序

Exiting the profiled application from inside a zone is not supported. When the client calls exit(), the profiler will wait for all zones to end before a program can be truly terminated. If program execution stops inside a zone, this will never happen, and the profiled application will seemingly hang up. At this point, you will need to manually terminate the program (or disconnect the profiler server).
不支持从区域内退出剖析应用程序。当客户端调用 exit() 时,剖析器会等待所有区域结束,然后才能真正终止程序。如果程序执行在区域内停止,这将永远不会发生,被剖析应用程序似乎会挂起。此时,您需要手动终止程序(或断开剖析器服务器连接)。
As a workaround, you may add a try/catch pair at the bottom of the function stack (for example in the main() function) and replace exit() calls with throwing a custom exception. When this exception is caught, you may call exit(), knowing that the application's data structures (including profiling zones) were properly cleaned up.
作为一种变通方法,您可以在函数栈底部(例如在 main() 函数中)添加一个 try/catch 对,并用抛出一个自定义异常来替代 exit() 调用。当该异常被捕获时,您可以调用 exit(),因为您知道应用程序的数据结构(包括剖析区)已被正确清理。

3.5 Marking locks 3.5 标记锁

Modern programs must use multi-threading to achieve the full performance capability of the CPU. However, correct execution requires claiming exclusive access to data shared between threads. When many threads want to simultaneously enter the same critical section, the application's multi-threaded performance advantage nullifies. To help solve this problem, Tracy can collect and display lock interactions in threads.
现代程序必须使用多线程才能充分发挥 CPU 的性能。然而,正确的执行要求对线程之间共享的数据进行独占访问。当许多线程想同时进入同一关键部分时,应用程序的多线程性能优势就会化为乌有。为了帮助解决这个问题,Tracy 可以收集并显示线程中的锁交互。
To mark a lock (mutex) for event reporting, use the TracyLockable (type, varname) macro. Note that the lock must implement the Mutex requirement (i.e., there's no support for timed mutexes). For a concrete example, you would replace the line
要为事件报告标记一个锁(mutex),请使用 TracyLockable (type, varname) 宏。请注意,锁必须满足互斥要求 (即不支持定时互斥)。举个具体例子,你可以替换以下一行
std::mutex m_lock;
with 
TracyLockable(std::mutex, m_lock);
TracyLockable(std::mutex, m_lock);
Alternatively, you may use TracyLockableN(type, varname, description) to provide a custom lock name at a global level, which will replace the automatically generated 'std::mutex m_lock'-like name. You may also set a custom name for a specific instance of a lock, through the LockableName (varname, name, size) macro.
或者,你可以使用 TracyLockableN(type, varname, description) 在全局层面提供自定义锁名,它将取代自动生成的 "std::mutex m_lock "类似名称。你也可以通过 LockableName (varname, name, size) 宏为锁的特定实例设置自定义名称。
The standard std::lock_guard and std::unique_lock wrappers should use the LockableBase(type) macro for their template parameter (unless you're using , with improved template argument deduction). For example:
标准的 std::lock_guard 和 std::unique_lock 包装器应该使用 LockableBase(type) 宏作为模板参数(除非使用了 ,并改进了模板参数推导)。例如
std::lock_guard<LockableBase(std::mutex)> lock(m_lock);
std::lock_guard<LockableBase(std::mutex)> lock(m_lock);
To mark the location of a lock being held, use the LockMark (varname) macro after you have obtained the lock. Note that the varname must be a lock variable (a reference is also valid). This step is optional.
要标记锁的位置,请在获得锁后使用 LockMark (varname) 宏。请注意,varname 必须是一个锁变量(引用也有效)。这一步是可选的。
Similarly, you can use TracySharedLockable, TracySharedLockableN and SharedLockableBase to mark locks implementing the SharedMutex requirement . Note that while there's no support for timed mutices in Tracy, both std::shared_mutex and std: :shared_timed_mutex may be used .
同样,您可以使用 TracySharedLockable、TracySharedLockableN 和 SharedLockableBase 来标记实现 SharedMutex 要求的锁 。请注意,虽然 Tracy 中不支持定时互斥,但 std::shared_mutex 和 std: :shared_timed_mutex 都可以使用

Condition variables 条件变量

The standard std:: condition_variable is only able to accept std::mutex locks. To be able to use Tracy lock wrapper, use std::condition_variable_any instead.
标准的 std:: condition_variable 只能接受 std::mutex 锁。要使用 Tracy 锁包装器,请使用 std::condition_variable_any 代替。

Caveats 注意事项

Due to the limits of internal bookkeeping in the profiler, you may use each lock in no more than 64 unique threads. If you have many short-lived temporary threads, consider using a thread pool to limit the number of created threads.
由于剖析器内部记账的限制,每个锁最多只能在 64 个线程中使用。如果你有很多短暂的临时线程,可以考虑使用线程池来限制创建线程的数量。

3.5.1 Custom locks 3.5.1 定制锁

If using the TracyLockable or TracySharedLockable wrappers does not fit your needs, you may want to add a more fine-grained instrumentation to your code. Classes LockableCtx and SharedLockableCtx contained in the TracyLock. hpp header contain all the required functionality. Lock implementations in classes Lockable and SharedLockable show how to properly perform context handling.
如果使用 TracyLockable 或 TracySharedLockable 封装程序无法满足您的需求,您可能需要在代码中添加更精细的仪表。TracyLock. hpp 头中包含的类 LockableCtx 和 SharedLockableCtx 包含所有必需的功能。类 Lockable 和 SharedLockable 中的锁定实现展示了如何正确执行上下文处理。

3.6 Plotting data 3.6 绘制数据图

Tracy can capture and draw numeric value changes over time. You may use it to analyze draw call counts, number of performed queries, etc. To report data, use the TracyPlot (name, value) macro.
Tracy 可以捕捉和绘制数值随时间的变化。您可以用它来分析绘制调用次数、执行查询次数等。要报告数据,请使用 TracyPlot (name, value) 宏。
To configure how plot values are presented by the profiler, you may use the TracyPlotConfig(name, format, step, fill, color) macro, where format is one of the following options:
要配置剖析器如何显示绘图值,可以使用 TracyPlotConfig(name, format, step, fill, color) 宏,其中 format 为以下选项之一:
  • tracy::PlotFormatType::Number - values will be displayed as plain numbers.
    tracy::PlotFormatType::Number - 数值将显示为纯数字。
  • tracy::PlotFormatType::Memory - treats the values as memory sizes. Will display kilobytes, megabytes, etc.
    tracy::PlotFormatType::Memory - 将数值视为内存大小。将显示千字节、兆字节等。
  • tracy::PlotFormatType::Percentage - values will be displayed as percentage (with value 100 being equal to ).
    tracy::PlotFormatType::Percentage - 数值将以百分比显示(数值 100 等于 )。
The step parameter determines whether the plot will be displayed as a staircase or will smoothly change between plot points (see figure 6). The fill parameter can be used to disable filling the area below the plot with a solid color.
阶梯参数决定绘图是以阶梯状显示,还是在绘图点之间平滑变化(见图 6)。填充参数可用于禁止在绘图下方区域填充纯色。
Figure 6: An identical set of values on a smooth plot (left) and a staircase plot (right).
图 6:平滑图(左)和阶梯图(右)上的一组相同值。
Each plot has its own color, which by default is derived from the plot name (each unique plot name produces its own color, which does not change between profiling runs). If you want to provide your own color instead, you may enter the color parameter. Note that you should set the color value to 0 if you do not want to set your own color.
每个图块都有自己的颜色,默认情况下,颜色来自图块名称(每个唯一的图块名称都有自己的颜色,在剖析运行之间不会改变)。如果您想提供自己的颜色,可以输入颜色参数。请注意,如果不想设置自己的颜色,则应将颜色值设置为 0。
For reference, the following command sets the default parameters of the plot (that is, it's a no-op): TracyPlotConfig(name, tracy::PlotFormatType::Number, false, true, 0).
以下命令设置了绘图的默认参数(即无操作),以供参考:TracyPlotConfig(name, tracy::PlotFormatType::Number, false, true, 0)。
It is beneficial but not required to use a unique pointer for name string literal (see section 3.1.2 for more details).
为名称字符串字面量使用唯一指针是有益的,但并非必须(详见第 3.1.2 节)。

3.7 Message log 3.7 信息日志

Fast navigation in large data sets and correlating zones with what was happening in the application may be difficult. To ease these issues, Tracy provides a message log functionality. You can send messages (for example, your typical debug output) using the TracyMessage (text, size) macro. Alternatively, use TracyMessageL(text) for string literal messages.
要快速浏览大型数据集,并将区域与应用程序中发生的情况联系起来,可能会很困难。为了解决这些问题,Tracy 提供了消息日志功能。你可以使用 TracyMessage (text, size) 宏发送消息(例如典型的调试输出)。或者使用 TracyMessageL(text) 宏来发送字符串字面信息。
If you want to include color coding of the messages (for example to make critical messages easily visible), you can use TracyMessageC(text, size, color) or TracyMessageLC(text, color) macros.
如果要对信息进行颜色编码(例如,使关键信息更容易被看到),可以使用 TracyMessageC(文本、大小、颜色)或 TracyMessageLC(文本、颜色)宏。

3.7.1 Application information
3.7.1 应用信息

Tracy can collect additional information about the profiled application, which will be available in the trace description. This can include data such as the source repository revision, the application's environment (dev / prod), etc.
Tracy 可以收集有关剖析应用程序的其他信息,这些信息将在跟踪描述中提供。这些信息可包括源代码库修订版、应用程序环境(dev/prod)等数据。
Use the TracyAppInfo(text, size) macro to report the data.
使用 TracyAppInfo(text, size) 宏来报告数据。

3.8 Memory profiling 3.8 内存剖析

Tracy can monitor the memory usage of your application. Knowledge about each performed memory allocation enables the following:
Tracy 可以监控应用程序的内存使用情况。通过了解每次执行的内存分配情况,可以实现以下功能:
  • Memory usage graph (like in massif, but fully interactive).
    内存使用图(与 massif 类似,但完全交互式)。
  • List of active allocations at program exit (memory leaks).
    程序退出时的活动分配列表(内存泄漏)。
  • Visualization of the memory map.
    内存地图可视化
  • Ability to rewind view of active allocations and memory map to any point of program execution.
    可将活动分配和内存映射的视图后退到程序执行的任何位置。
  • Information about memory statistics of each zone.
    每个区段的内存统计信息。
  • Memory allocation hot-spot tree.
    内存分配热点树
To mark memory events, use the TracyAlloc(ptr, size) and TracyFree (ptr) macros. Typically you would do that in overloads of operator new and operator delete, for example:
要标记内存事件,可使用 TracyAlloc(ptr, size) 和 TracyFree (ptr) 宏。例如,您通常会在操作符 new 和操作符 delete 的重载中这样做:
void* operator new(std::size_t count)
{
    auto ptr = malloc(count);
    TracyAlloc(ptr, count);
    return ptr;
}
void operator delete(void* ptr) noexcept
{
    TracyFree(ptr);
    free(ptr);
}
In some rare cases (e.g., destruction of TLS block), events may be reported after the profiler is no longer available, which would lead to a crash. To work around this issue, you may use TracySecureAlloc and TracySecureFree variants of the macros.
在某些罕见情况下(如 TLS 块的销毁),事件可能会在剖析器不再可用后才报告,从而导致崩溃。要解决这个问题,可以使用 TracySecureAlloc 和 TracySecureFree 变体宏。
Important 重要
Each tracked memory-free event must also have a corresponding memory allocation event. Tracy will terminate the profiling session if this assumption is broken (see section 4.7). If you encounter this issue, you may want to check for:
每个被跟踪的无内存事件都必须有相应的内存分配事件。如果这一假设被打破,Tracy 将终止剖析会话(见第 4.7 节)。如果遇到此问题,可能需要检查以下内容
  • Mismatched malloc/new or free/delete.
    malloc/new 或 free/delete 不匹配。
  • Reporting the same memory address being allocated twice (without a free between two allocations).
    报告同一内存地址被分配两次(两次分配之间没有释放)。
  • Double freeing the memory.
    双倍释放内存
  • Untracked allocations made in external libraries that are freed in the application.
    在应用程序中释放的外部库中的未跟踪分配。
  • Places where the memory is allocated, but profiling markup is added.
    分配内存的位置,但会添加剖析标记。
This requirement is relaxed in the on-demand mode (section 2.1.5) because the memory allocation event might have happened before the server made the connection.
在按需模式(第 2.1.5 节)中,这一要求被放宽,因为内存分配事件可能在服务器建立连接之前就已发生。

Non-stable memory addresses
非稳定内存地址

Note that the pointer data you provide to the profiler does not have to reflect the actual memory layout, which you may not know in some cases. This includes the possibility of having multiple overlapping memory allocation regions. For example, you may want to track GPU memory, which may be mapped to different locations in the program address space during allocation and freeing. Or maybe you use some memory defragmentation scheme, which by its very design moves pointers around. You may instead use unique numeric identifiers to identify allocated objects in such cases. This will make some profiler facilities unavailable. For example, the memory map won't have much sense anymore.
请注意,您提供给剖析器的指针数据不必反映实际的内存布局,因为在某些情况下您可能不知道实际的内存布局。这包括可能存在多个重叠的内存分配区域。例如,您可能想要跟踪 GPU 内存,而在分配和释放过程中,GPU 内存可能会映射到程序地址空间的不同位置。或者,你可能会使用某种内存碎片整理方案,这种方案的设计本身就会移动指针。在这种情况下,您可以使用唯一的数字标识符来标识已分配的对象。这将导致某些剖析器工具无法使用。例如,内存映射就不再有什么意义了。

3.8.1 Memory pools 3.8.1 内存池

Sometimes an application will use more than one memory pool. For example, in addition to tracking the heap, you may also be interested in memory usage of a graphic API, such as Vulkan. Or maybe you want to see how your scripting language is managing memory.
有时,应用程序会使用多个内存池。例如,除了跟踪 堆外,您可能还对图形 API(如 Vulkan)的内存使用情况感兴趣。或者,您可能想了解脚本语言是如何管理内存的。
To mark that a separate memory pool is to be tracked you should use the named version of memory macros, for example TracyAllocN (ptr, size, name) and TracyFreeN (ptr, name), where name is an unique pointer to a string literal (section 3.1.2) identifying the memory pool.
要标记要跟踪的独立内存池,应使用内存宏的命名版本,例如 TracyAllocN (ptr, size, name) 和 TracyFreeN (ptr, name),其中 name 是指向标识内存池的字符串字面(第 3.1.2 节)的唯一指针。

3.9 GPU profiling 3.9 GPU 分析

Tracy provides bindings for profiling OpenGL, Vulkan, Direct3D 11, Direct3D 12, and OpenCL execution time on GPU.
Tracy 提供了用于剖析 GPU 上 OpenGL、Vulkan、Direct3D 11、Direct3D 12 和 OpenCL 执行时间的绑定。
Note that the CPU and GPU timers may be unsynchronized unless you create a calibrated context, but the availability of calibrated contexts is limited. You can try to correct the desynchronization of uncalibrated contexts in the profiler's options (section 5.4).
请注意,除非创建校准上下文,否则 CPU 和 GPU 定时器可能会不同步,但校准上下文的可用性有限。您可以尝试在剖析器选项中修正未校准上下文的不同步问题(第 5.4 节)。

Check the scope 检查范围

If the graphic API you are using requires explicitly stating that you start and finish the recording of command buffers, remember that the instrumentation macros requirements must be satisfied during the zone's construction and destruction. For example, the zone destructor will be executed in the following code after buffer recording has ended, which is an error.
如果您使用的图形 API 要求明确说明开始和结束命令缓冲区的记录,请记住在区域的构建和销毁过程中必须满足仪器宏的要求。例如,以下代码中的区域析构函数将在缓冲区记录结束后执行,这是一个错误。
{
vkBeginCommandBuffer(cmd, &beginInfo);
vkBeginCommandBuffer(cmd, &beginInfo);
TracyVkZone(ctx, cmd, "Render");
TracyVkZone(ctx, cmd, "Render");
vkEndCommandBuffer(cmd);
vkEndCommandBuffer(cmd);
}
Add a nested scope encompassing the command buffer recording section to fix such issues.
添加包含命令缓冲区记录部分的嵌套作用域,以解决此类问题。

Caveat emptor 注意事项

The profiling results you will get can be unreliable or plainly wrong. It all depends on the quality of graphics drivers and how the underlying hardware implements timers. While Tracy employs some heuristics to make things as reliable as possible, it must talk to the GPU through the commonly unreliable API calls.
你得到的剖析结果可能不可靠,也可能完全错误。这完全取决于图形驱动程序的质量以及底层硬件如何实现计时器。虽然 Tracy 采用了一些启发式方法来尽可能提高可靠性,但它必须通过常见的不可靠 API 调用来与 GPU 通信。
For example, on Linux, the Intel GPU driver will report 64-bit precision of time stamps. Unfortunately, this is not true, as the driver will only provide timestamps with 36 -bit precision, rolling over the exceeding values. Tracy can detect such problems and employ workarounds. This is, sadly, not enough to make the readings reliable, as this timer we can access through the API is not a real one. Deep down, the driver has access to the actual timer, which it uses to provide the virtual values we can get. Unfortunately, this hardware timer has a period which does not match the period of the API timer. As a result, the virtual timer will sometimes overflow in midst of a cycle, making the reported time values jump forward. This is a problem that only the driver vendor can fix.
例如,在 Linux 上,英特尔 GPU 驱动程序会报告 64 位精度的时间戳。遗憾的是,事实并非如此,因为驱动程序只会提供 36 位精度的时间戳,并对超出的值进行滚动处理。Tracy 可以检测到此类问题并采取变通方法。遗憾的是,这还不足以让读数变得可靠,因为我们通过应用程序接口访问的计时器并非真实的计时器。在深处,驱动程序可以访问实际的定时器,并利用它来提供我们可以获得的虚拟值。不幸的是,这个硬件定时器的周期与 API 定时器的周期不一致。因此,虚拟定时器有时会在周期中溢出,使报告的时间值向前跳转。这个问题只有驱动程序供应商才能解决。
Another problem discovered on AMD GPUs under Linux causes the timestamp register to be reset every time the GPU enters a low-power mode. This can happen virtually every frame if you are rendering with vertical synchronization disabled. Needless to say, the timestamp data is not very useful in this case. The solution to this problem is to navigate to the /sys/devices/pci*/// directory corresponding to the GPU and set the power_dpm_force_performance_level value to manual and the pp_power_profile_mode value to the number corresponding to the COMPUTE profile. Your mileage may vary, however - on my system I only have one of these values available to set. Nevertheless, you will find a similar solution suggested by the system vendor in a Direct3D 12 section later in the manual.
在 Linux 下的 AMD GPU 上发现的另一个问题是,每次 GPU 进入低功耗模式时,时间戳寄存器都会被重置。如果在渲染时禁用了垂直同步,这种情况几乎每帧都会发生。不用说,时间戳数据在这种情况下作用不大。解决这个问题的方法是导航到 GPU 对应的 /sys/devices/pci*// 目录,将 power_dpm_force_performance_level 值设置为手动,将 pp_power_profile_mode 值设置为 COMPUTE 配置文件对应的数值。不过,您的情况可能会有所不同,在我的系统上,我只能设置其中一个值。不过,在本手册稍后的 Direct3D 12 章节中,系统供应商也提出了类似的解决方案。
If you experience crippling problems while profiling the GPU, you might get better results with a different driver, different operating system, or different hardware.
如果在对 GPU 进行剖析时遇到令人头疼的问题,使用不同的驱动程序、不同的操作系统或不同的硬件可能会得到更好的结果。

3.9.1 OpenGL

You will need to include the public/tracy/TracyOpenGL.hpp header file and declare each of your rendering contexts using the TracyGpuContext macro (typically, you will only have one context). Tracy expects no
您需要包含 public/tracy/TracyOpenGL.hpp 头文件,并使用 TracyGpuContext 宏声明每个渲染上下文(通常情况下,您只有一个上下文)。Tracy 不期望

more than one context per thread and no context migration. To set a custom name for the context, use the TracyGpuContextName (name, size) macro.
每个线程有一个以上的上下文,并且没有上下文迁移。要为上下文设置自定义名称,请使用 TracyGpuContextName (name, size) 宏。
To mark a GPU zone use the TracyGpuZone (name) macro, where name is a string literal name of the zone. Alternatively you may use TracyGpuZoneC (name, color) to specify zone color.
要标记 GPU 区域,请使用 TracyGpuZone (name) 宏,其中 name 是区域名称的字面字符串。您也可以使用 TracyGpuZoneC (name, color) 来指定区域颜色。
You also need to periodically collect the GPU events using the TracyGpuCollect macro. An excellent place to do it is after the swap buffers function call.
您还需要使用 TracyGpuCollect 宏定期收集 GPU 事件。最好在调用交换缓冲区函数后进行收集。

Caveats 注意事项

  • OpenGL profiling is not supported on OSX, iOS .
    OSX 和 iOS 不支持 OpenGL 剖析
  • Nvidia drivers are unable to provide consistent timing results when two OpenGL contexts are used simultaneously.
    当同时使用两个 OpenGL 上下文时,Nvidia 驱动程序无法提供一致的计时结果。
  • Calling the TracyGpuCollect macro is a fairly slow operation (couple ).
    调用 TracyGpuCollect 宏是一项相当缓慢的操作(一对夫妇 )。
Because Apple is unable to implement standards properly.
因为 Apple 无法正确执行标准。

3.9.2 Vulkan

Similarly, for Vulkan support you should include the public/tracy/TracyVulkan. hpp header file. Tracing Vulkan devices and queues is a bit more involved, and the Vulkan initialization macro TracyVkContext (physdev, device, queue, cmdbuf) returns an instance of TracyVkCtx object, which tracks an associated Vulkan queue. Cleanup is performed using the TracyVkDestroy (ctx) macro. You may create multiple Vulkan contexts. To set a custom name for the context, use the TracyVkContextName (ctx, name, size) macro.
同样,若要支持 Vulkan,则应包含 public/tracy/TracyVulkan. hpper 头文件。跟踪 Vulkan 设备和队列涉及的内容较多,Vulkan 初始化宏 TracyVkContext (physdev, device, queue, cmdbuf) 会返回一个 TracyVkCtx 对象实例,该对象会跟踪相关的 Vulkan 队列。清理使用 TracyVkDestroy (ctx) 宏执行。您可以创建多个 Vulkan 上下文。要为上下文设置自定义名称,请使用 TracyVkContextName (ctx, name, size) 宏。
The physical device, logical device, queue, and command buffer must relate to each other. The queue must support graphics or compute operations. The command buffer must be in the initial state and be able to be reset. The profiler will rerecord and submit it to the queue multiple times, and it will be in the executable state on exit from the initialization function.
物理设备、逻辑设备、队列和命令缓冲区必须相互关联。队列必须支持图形或计算操作。命令缓冲区必须处于初始状态并能重置。剖析器将多次重新记录并提交到队列,并在退出初始化函数时处于可执行状态。
To mark a GPU zone use the TracyVkZone (ctx, cmdbuf, name) macro, where name is a string literal name of the zone. Alternatively you may use TracyVkZoneC(ctx, cmdbuf, name, color) to specify zone color. The provided command buffer must be in the recording state, and it must be created within the queue that is associated with ctx context.
要标记 GPU 区域,请使用 TracyVkZone (ctx, cmdbuf, name) 宏,其中 name 是区域名称的字面字符串。您也可以使用 TracyVkZoneC(ctx, cmdbuf, name, color) 来指定区域颜色。提供的命令缓冲区必须处于记录状态,而且必须在与 ctx 上下文关联的队列中创建。
You also need to periodically collect the GPU events using the TracyVkCollect (ctx, cmdbuf) macro . The provided command buffer must be in the recording state and outside a render pass instance.
您还需要使用 TracyVkCollect (ctx, cmdbuf) 宏 定期收集 GPU 事件。所提供的命令缓冲区必须处于记录状态,并且在呈现传递实例之外。
Calibrated context In order to maintain synchronization between CPU and GPU time domains, you will need to enable the VK_EXT_calibrated_timestamps device extension and retrieve the following function pointers: vkGetPhysicalDeviceCalibrateableTimeDomainsEXT and vkGetCalibratedTimestampsEXT.
校准上下文 为了保持 CPU 和 GPU 时域之间的同步,您需要启用 VK_EXT_calibrated_timestamps 设备扩展并检索以下函数指针:vkGetPhysicalDeviceCalibrateableTimeDomainsEXT 和 vkGetCalibratedTimestampsEXT。
To enable calibrated context, replace the macro TracyVkContext with TracyVkContextCalibrated and pass the two functions as additional parameters, in the order specified above.
要启用校准上下文,请将宏 TracyVkContext 替换为 TracyVkContextCalibrated,并按上述顺序将这两个函数作为附加参数传递。
Using Vulkan 1.2 features Vulkan 1.2 and VK_EXT_host_query_reset provide mechanics to reset the query pool without the need of a command buffer. By using TracyVkContextHostCalibrated you can make use of this feature. It only requires a function pointer to vkResetQueryPool in addition to the ones required for TracyVkContextCalibrated instead of the VkQueue and VkCommandBuffer handles.
使用 Vulkan 1.2 功能 Vulkan 1.2 和 VK_EXT_host_query_reset 提供了无需命令缓冲区即可重置查询池的机制。通过使用 TracyVkContextHostCalibrated,您可以利用这一功能。除了 TracyVkContextCalibrated 所需的函数指针外,它只需要一个指向 vkResetQueryPool 的函数指针,而不需要 VkQueue 和 VkCommandBuffer 句柄。
However, using this feature requires the physical device to have calibrated device and host time domains. In addition to VK_TIME_DOMAIN_DEVICE_EXT, vkGetPhysicalDeviceCalibrateableTimeDomainsEXT will have to additionally return either VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT or VK_TIME_DOMAIN_QUERY_PERFORMANCE_COUNT for Unix and Windows, respectively. If this is not the case, you will need to use TracyVkContextCalibrated or TracyVkContext macro instead.
不过,使用此功能需要物理设备具有已校准的设备和主机时域。除了 VK_TIME_DOMAIN_DEVICE_EXT 之外,vkGetPhysicalDeviceCalibrateableTimeDomainsEXT 还必须额外返回 VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT 或 VK_TIME_DOMAIN_QUERY_PERFORMANCE_COUNT(分别用于 Unix 和 Windows)。如果不是这种情况,则需要使用 TracyVkContextCalibrated 或 TracyVkContext 宏来代替。
Dynamically loading the Vulkan symbols Some applications dynamically link the Vulkan loader, and manage a local symbol table, to remove the trampoline overhead of calling through the Vulkan loader itself.
动态加载 Vulkan 符号 有些应用程序动态链接 Vulkan 加载器,并管理本地符号表,以消除通过 Vulkan 加载器本身调用的蹦床开销。
When TRACY_VK_USE_SYMBOL_TABLE is defined the signature of TracyVkContext, TracyVkContextCalibrated, and TracyVkContextHostCalibrated are adjusted to take in the VkInstance, PFN_vkGetInstanceProcAddr, and PFN_vkGetDeviceProcAddr to enable constructing a local symbol table to be used to call through the Vulkan API when tracing.
定义 TRACY_VK_USE_SYMBOL_TABLE 后,TracyVkContext、TracyVkContextCalibrated 和 TracyVkContextHostCalibrated 的签名将进行调整,以接收 VkInstance、PFN_vkGetInstanceProcAddr 和 PFN_vkGetDeviceProcAddr,从而能够构建本地符号表,用于在跟踪时调用 Vulkan API。

3.9.3 Direct3D 11

To enable Direct3D 11 support, include the public/tracy/TracyD3D11.hpp header file, and create a TracyD3D11Ctx object with the TracyD3D11Context(device, devicecontext) macro. The object should later be cleaned up with the TracyD3D11Destroy macro. Tracy does not support D3D11 command lists. To set a custom name for the context, use the TracyGpuContextName (name, size) macro.
要启用 Direct3D 11 支持,请包含 public/tracy/TracyD3D11.hpp 头文件,并使用 TracyD3D11Context(device, devicecontext) 宏创建一个 TracyD3D11Ctx 对象。随后应使用 TracyD3D11Destroy 宏清理该对象。Tracy 不支持 D3D11 命令列表。要为上下文设置自定义名称,请使用 TracyGpuContextName (name, size) 宏。
To mark a GPU zone, use the TracyD3D11Zone (name) macro, where name is a string literal name of the zone. Alternatively you may use TracyD3D11ZoneC(name, color) to specify zone color.
要标记 GPU 区域,请使用 TracyD3D11Zone (name) 宏,其中 name 是区域名称的字面字符串。您也可以使用 TracyD3D11ZoneC(name, color) 来指定区域颜色。
You also need to periodically collect the GPU events using the TracyD3D11Collect macro. An excellent place to do it is after the swap chain present function.
您还需要使用 TracyD3D11Collect 宏定期收集 GPU 事件。在交换链呈现函数之后,就是收集事件的绝佳位置。

3.9.4 Direct3D 12

To enable Direct3D 12 support, include the public/tracy/TracyD3D12.hpp header file. Tracing Direct3D 12 queues is nearly on par with the Vulkan implementation, where a TracyD3D12Ctx is returned from a call to TracyD3D12Context(device, queue), which should be later cleaned up with the TracyD3D12Destroy (ctx) macro. Multiple contexts can be created, each with any queue type. To set a custom name for the context, use the TracyD3D12ContextName (ctx, name, size) macro.
要启用 Direct3D 12 支持,请包含 public/tracy/TracyD3D12.hpp 头文件。对 Direct3D 12 队列的跟踪几乎与 Vulkan 实现相同,在 Vulkan 实现中,调用 TracyD3D12Context(device, queue) 会返回 TracyD3D12Ctx,随后应使用 TracyD3D12Destroy (ctx) 宏对其进行清理。可以创建多个上下文,每个上下文可以使用任何队列类型。要为上下文设置自定义名称,请使用 TracyD3D12ContextName (ctx, name, size) 宏。
The queue must have been created through the specified device, however, a command list is not needed for this stage.
队列必须是通过指定设备创建的,但在此阶段不需要命令列表。
Using GPU zones is the same as the Vulkan implementation, where the TracyD3D12Zone (ctx, cmdList, name) macro is used, with name as a string literal. TracyD3D12ZoneC(ctx, cmdList, name, color) can be used to create a custom-colored zone. The given command list must be in an open state.
使用 GPU 区域的方法与 Vulkan 实现相同,即使用 TracyD3D12Zone(ctx, cmdList, name) 宏,name 为字面字符串。TracyD3D12ZoneC(ctx, cmdList, name, color) 可用于创建自定义颜色区域。给定的命令列表必须处于打开状态。
The macro TracyD3D12NewFrame (ctx) is used to mark a new frame, and should appear before or after recording command lists, similar to FrameMark. This macro is a key component that enables automatic query data synchronization, so the user doesn't have to worry about synchronizing GPU execution before invoking a collection. Event data can then be collected and sent to the profiler using the TracyD3D12Collect (ctx) macro.
TracyD3D12NewFrame (ctx) 宏用于标记新帧,应出现在记录命令列表之前或之后,类似于 FrameMark。该宏是实现自动同步查询数据的关键组件,因此用户无需担心在调用集合之前同步 GPU 的执行。然后,可使用 TracyD3D12Collect (ctx) 宏收集事件数据并发送给剖析器。
Note that GPU profiling may be slightly inaccurate due to artifacts from dynamic frequency scaling. To counter this, ID3D12Device::SetStablePowerState() can be used to enable accurate profiling, at the expense of some performance. If the machine is not in developer mode, the operating system will remove the device upon calling. Do not use this in the shipping code.
请注意,由于动态频率缩放产生的伪影,GPU 曲线分析可能会略微不准确。为了解决这个问题,可以使用 ID3D12Device::SetStablePowerState() 来启用精确的剖析,但会牺牲一些性能。如果机器未处于开发模式,操作系统将在调用时移除设备。请勿在运行代码中使用。
Direct3D 12 contexts are always calibrated.
Direct3D 12 上下文始终经过校准。

3.9.5 OpenCL

OpenCL support is achieved by including the public/tracy/TracyOpenCL. hpp header file. Tracing OpenCL requires the creation of a Tracy OpenCL context using the macro TracyCLContext (context, device), which will return an instance of TracyCLCtx object that must be used when creating zones. The specified device must be part of the context. Cleanup is performed using the TracyCLDestroy (ctx) macro. Although not common, it is possible to create multiple OpenCL contexts for the same application. To set a custom name for the context, use the TracyCLContextName (ctx, name, size) macro.
通过包含 public/tracy/TracyOpenCL. hpper 头文件,可实现对 OpenCL 的支持。跟踪 OpenCL 需要使用宏 TracyCLContext(上下文,设备)创建 Tracy OpenCL 上下文,该宏将返回一个 TracyCLCtx 对象实例,创建区域时必须使用该对象。指定的设备必须是上下文的一部分。清理使用 TracyCLDestroy (ctx) 宏执行。为同一应用程序创建多个 OpenCL 上下文并不常见。要为上下文设置自定义名称,可使用 TracyCLContextName (ctx, name, size) 宏。
To mark an OpenCL zone one must make sure that a valid OpenCL cl_event object is available. The event will be the object that Tracy will use to query profiling information from the OpenCL driver. For this to work, you must create all OpenCL queues with the CL_QUEUE_PROFILING_ENABLE property.
要标记 OpenCL 区域,必须确保有一个有效的 OpenCL cl_event 对象。该事件将是 Tracy 用来从 OpenCL 驱动程序中查询剖析信息的对象。为此,必须使用 CL_QUEUE_PROFILING_ENABLE 属性创建所有 OpenCL 队列。
OpenCL zones can be created with the TracyCLZone (ctx, name) where name will usually be a descriptive name for the operation represented by the cl_event. Within the scope of the zone, you must call TracyCLSetEvent (event) for the event to be registered in Tracy.
可以使用 TracyCLZone (ctx, name) 创建 OpenCL 区域,其中 name 通常是 cl_event 所代表操作的描述性名称。在区域范围内,必须调用 TracyCLSetEvent (event),事件才能在 Tracy 中注册。
Similar to Vulkan and OpenGL, you also need to periodically collect the OpenCL events using the TracyCLCollect(ctx) macro. An excellent place to perform this operation is after a clFinish since this will ensure that any previously queued OpenCL commands will have finished by this point.
与 Vulkan 和 OpenGL 类似,您也需要使用 TracyCLCollect(ctx) 宏定期收集 OpenCL 事件。在 clFinish 之后执行此操作非常合适,因为这将确保之前排队的任何 OpenCL 命令都已在此时完成。

3.9.6 Multiple zones in one scope
3.9.6 一个范围中的多个区

Putting more than one GPU zone macro in a single scope features the same issue as with the ZoneScoped macros, described in section 3.4 .2 (but this time the variable name is ___tracy_gpu_zone).
在一个作用域中放置多个 GPU 区域宏的问题与 3.4.2 节中描述的 ZoneScoped 宏相同(但这次的变量名是___tracy_gpu_zone)。
To solve this problem, in case of OpenGL use the TracyGpuNamedZone macro in place of TracyGpuZone (or the color variant). The same applies to Vulkan and Direct3D 11/12 - replace TracyVkZone with TracyVkNamedZone and TracyD3D11Zone/TracyD3D12Zone with TracyD3D11NamedZone/TracyD3D12NamedZone.
要解决这个问题,在 OpenGL 的情况下,请使用 TracyGpuNamedZone 宏代替 TracyGpuZone(或颜色变体)。这同样适用于 Vulkan 和 Direct3D 11/12--将 TracyVkZone 替换为 TracyVkNamedZone,将 TracyD3D11Zone/TracyD3D12Zone 替换为 TracyD3D11NamedZone/TracyD3D12NamedZone。
Remember to provide your name for the created stack variable as the first parameter to the macros.
切记将创建的堆栈变量名作为宏的第一个参数。

3.9.7 Transient GPU zones
3.9.7 GPU 暂存区

Transient zones (see section 3.4.4 for details) are available in OpenGL, Vulkan, and Direct3D 11/12 macros.
OpenGL、Vulkan 和 Direct3D 11/12 宏中提供了瞬态区(详见第 3.4.4 节)。

3.10 Fibers 3.10 纤维

Fibers are lightweight threads, which are not under the operating system's control and need to be manually scheduled by the application. As far as Tracy is concerned, there are other cooperative multitasking primitives, like coroutines, or green threads, which also fall under this umbrella.
纤维线程是轻量级线程,不受操作系统控制,需要应用程序手动调度。就 Tracy 而言,还有其他合作式多任务原语,如 coroutines 或绿色线程,也属于这一范畴。
To enable fiber support in the client code, you will need to add the TRACY_FIBERS define to your project. You need to do this explicitly, as there is a small performance hit due to additional processing.
要在客户端代码中启用光纤支持,需要在项目中添加 TRACY_FIBERS 定义。由于额外的处理过程会对性能造成一定影响,因此需要明确添加。
To properly instrument fibers, you will need to modify the fiber dispatch code in your program. You will need to insert the TracyFiberEnter (fiber) macro every time a fiber starts or resumes execution. You will also need to insert the TracyFiberLeave macro when the execution control in a thread returns to the non-fiber part of the code. Note that you can safely call TracyFiberEnter multiple times in succession, without an intermediate TracyFiberLeave if one fiber is directly switching to another, without returning control to the fiber dispatch worker.
要正确检测光纤,您需要修改程序中的光纤调度代码。每次光纤启动或恢复执行时,都需要插入 TracyFiberEnter(光纤)宏。当线程的执行控制返回到代码的非光纤部分时,也需要插入 TracyFiberLeave 宏。请注意,如果一条光纤直接切换到另一条光纤,您可以安全地连续多次调用 TracyFiberEnter,而无需中间的 TracyFiberLeave,也不会将控制权返回给光纤调度工作者。
Fibers are identified by unique const char* string names. Remember that you should observe the rules laid out in section 3.1.2 while handling such strings.
纤维由唯一的 const char* 字符串名标识。请记住,在处理此类字符串时应遵守第 3.1.2 节中规定的规则。
No additional instrumentation is needed in other parts of the code. Zones, messages, and other such events will be properly attributed to the currently running fiber in its own separate track.
代码的其他部分不需要额外的仪器。区域、消息和其他类似事件都将正确地归属于当前运行的光纤,并在其独立的轨道中进行处理。
A straightforward example, which is not actually using any OS fiber functionality, is presented below:
下面是一个简单的例子,实际上没有使用任何操作系统纤维功能:
const char* fiber = "job1";
TracyCZoneCtx zone;
int main()
{
    std::thread t1([]{
        TracyFiberEnter(fiber);
        TracyCZone(ctx, 1);
        zone = ctx;
        sleep (1)
        TracyFiberLeave;
    });
    t1.join();
    std::thread t2([]{
        TracyFiberEnter(fiber);
        sleep(1);
TracyCZoneEnd(zone);
    TracyFiberLeave;
});
t2.join();
}
As you can see, there are two threads, t 1 and t 2 , which are simulating worker threads that a real fiber library would use. A C API zone is created in thread t1 and is ended in thread t2. Without the fiber markup, this would be an invalid operation, but with fibers, the zone is attributed to fiber job1, and not to thread t1 or t2.
正如你所看到的,有两个线程 t 1 和 t 2,它们模拟了真正的光纤库会使用的工作线程。C API 区域在线程 t1 中创建,在线程 t2 中结束。如果没有纤维标记,这将是一个无效操作,但有了纤维后,该区域将归属于纤维 job1,而不是线程 t1 或 t2。

3.11 Collecting call stacks
3.11 收集调用堆栈

Capture of true calls stacks can be performed by using macros with the postfix, which require an additional parameter, specifying the depth of call stack to be captured. The greater the depth, the longer it will take to perform capture. Currently you can use the following macros: ZoneScopedS, ZoneScopedNS, ZoneScopedCS, ZoneScopedNCS,TracyAllocS,TracyFreeS,TracySecureAllocS, TracySecureFreeS, TracyMessageS, TracyMessageLS, TracyMessageCS, TracyMessageLCS, TracyGpuZoneS, TracyGpuZoneCS, TracyVkZoneS, TracyVkZoneCS, and the named and transient variants.
可以使用带有 后缀的宏来捕获真实调用堆栈,该宏需要一个附加参数,指定要捕获的调用堆栈深度。深度越大,执行捕获所需的时间就越长。目前,您可以使用以下宏:ZoneScopedS、ZoneScopedNS、ZoneScopedCS、ZoneScopedNCS、TracyAllocS、TracyFreeS、TracySecureAllocS、TracySecureFreeS、TracyMessageS、TracyMessageLS、TracyMessageCS、TracyMessageLCS、TracyGpuZoneS、TracyGpuZoneCS、TracyVkZoneS、TracyVkZoneCS 以及命名和瞬态变体。
Be aware that call stack collection is a relatively slow operation. Table 5 and figure 7 show how long it took to perform a single capture of varying depth on multiple CPU architectures.
请注意,调用堆栈收集是一项相对较慢的操作。表 5 和图 7 显示了在多个 CPU 架构上执行不同深度的单次捕获所需的时间。
Depth 深度 ARM ARM64
1 34 ns 98 ns
2 35 ns 150 ns 150 毫微秒
3 36 ns 168 ns
4 39 ns 190 ns
5 42 ns 206 ns
10 52 ns 306 ns
15 63 ns 415 ns
20 77 ns 531 ns
25 89 ns 630 ns 630 毫微秒
30 109 ns 735 ns
35 123 ns 843 ns
40 142 ns 950 ns 950 毫微秒
45 154 ns
50 167 ns
55 179 ns
60 193 ns
Table 5: Median times of zone capture with call stack. x86, x64: i7 8700K; ARM: Banana Pi; ARM64: ODROID-C2. Selected architectures are plotted on figure 7
x86, x64: i7 8700K; ARM:Banana Pi;ARM64:ODROID-C2。所选架构如图 7 所示
You can force call stack capture in the non-S postfixed macros by adding the TRACY_CALLSTACK define, set to the desired call stack capture depth. This setting doesn't affect the explicit call stack macros.
通过添加 TRACY_CALLSTACK 定义并将其设置为所需的调用栈捕获深度,可以在非 S 后缀宏中强制进行调用栈捕获。这一设置不会影响显式调用堆栈宏。
The maximum call stack depth that the profiler can retrieve is 62 frames. This is a restriction at the level of the operating system.
剖析器可获取的最大调用堆栈深度为 62 帧。这是操作系统层面的限制。
Tracy will automatically exclude certain uninteresting functions from the captured call stacks. So, for example, the pass-through intrinsic wrapper functions won't be reported.
Tracy 会自动从捕获的调用堆栈中排除某些不感兴趣的函数。例如,不会报告直通的固有封装函数。
Figure 7: Plot of call stack capture times (see table 5). Notice that the capture time grows linearly with requested capture depth
图 7:调用堆栈捕获时间图(见表 5)。请注意,捕获时间随请求的捕获深度线性增长
Important! 重要!
Collecting call stack data will also trigger retrieval of profiled program's executable code by the profiler. See section 3.15 .7 for details.
收集调用堆栈数据还会触发剖析器检索剖析程序的可执行代码。详见第 3.15 .7 节。

How to disable 如何禁用

Tracy will prepare for call stack collection regardless of whether you use the functionality or not. In some cases, this may be unwanted or otherwise troublesome for the user. To disable support for collecting call stacks, define the TRACY_NO_CALLSTACK macro.
无论您是否使用该功能,Tracy 都会为调用堆栈收集做好准备。在某些情况下,这可能会给用户带来不必要的麻烦。要禁用对收集调用堆栈的支持,请定义 TRACY_NO_CALLSTACK 宏。

libunwind

On some platforms you can define TRACY_LIBUNWIND_BACKTRACE to use libunwind to perform callstack captures as it might be a faster alternative than the default implementation. If you do, you must compile/link you client against libunwind. See https://github.com/libunwind/libunwind for more details.
在某些平台上,您可以定义 TRACY_LIBUNWIND_BACKTRACE 来使用 libunwind 执行调用栈捕获,因为它可能比默认实现更快。如果这样做,则必须根据 libunwind 编译/链接客户端。详情请参见 https://github.com/libunwind/libunwind。

3.11.1 Debugging symbols
3.11.1 调试符号

You must compile the profiled application with debugging symbols enabled to have correct call stack information. You can achieve that in the following way:
必须在启用调试符号的情况下编译剖析应用程序,才能获得正确的调用堆栈信息。具体方法如下:
  • On MSVC, open the project properties and go to Linker Debugging Generate Debug Info, where you should select the Generate Debug Information option.
    在 MSVC 中,打开项目属性,转到 Linker Debugging Generate Debug Info(链接器调试生成调试信息),选择 Generate Debug Information(生成调试信息)选项。
  • On gcc or clang remember to specify the debugging information -g parameter during compilation and do not add the strip symbols -s parameter. Additionally, omitting frame pointers will severely reduce the quality of stack traces, which can be fixed by adding the -fno-omit-frame-pointer parameter. Link the executable with an additional option -rdynamic (or --export-dynamic, if you are passing parameters directly to the linker).
    在使用 gcc 或 clang 时,请记住在编译时指定调试信息 -g 参数,并且不要添加 strip symbols -s 参数。此外,省略帧指针会严重降低堆栈跟踪的质量,这可以通过添加 -fno-omit-frame-pointer 参数来解决。使用附加选项 -rdynamic(或 --export-dynamic,如果直接向链接器传递参数)链接可执行文件。
  • On OSX, you may need to run dsymutil to extract the debugging data out of the executable binary.
    在 OSX 上,可能需要运行 dsymutil 才能从可执行二进制文件中提取调试数据。
  • On iOS you will have to add a New Run Script Phase to your XCode project, which shall execute the following shell script:
    在 iOS 上,你必须在 XCode 项目中添加一个新运行脚本阶段,该阶段将执行以下 shell 脚本:
cp -rf ${TARGET_BUILD_DIR}/${WRAPPER_NAME}.dSYM/* ${TARGET_BUILD_DIR}/${
    UNLOCALIZED_RESOURCES_FOLDER_PATH}/${PRODUCT_NAME}.dSYM
You will also need to setup proper dependencies, by setting the following input file: .dSYM, and the following output file:
您还需要设置适当的依赖关系,方法是设置以下输入文件: .dSYM 和以下输出文件:
${TARGET_BUILD_DIR}/${UNLOCALIZED_RESOURCES_FOLDER_PATH}/${PRODUCT_NAME}.dSYM.

3.11.1.1 External libraries
3.11.1.1 外部图书馆

You may also be interested in symbols from external libraries, especially if you have sampling profiling enabled (section 3.15.5).
您也可能对外部库中的符号感兴趣,尤其是在启用了采样剖析(第 3.15.5 节)的情况下。
Windows In MSVC you can retrieve such symbols by going to Tools Options Debugging Symbols and selecting appropriate Symbol file (.pdb) location servers. Note that additional symbols may significantly increase application startup times.
Windows 在 MSVC 中,您可以通过 "工具 "选项 调试 符号并选择适当的符号文件 (.pdb) 位置服务器来检索此类符号。请注意,额外的符号可能会大大增加应用程序的启动时间。
Libraries built with vcpkg typically provide PDB symbol files, even for release builds. Using vcpkg to obtain libraries has the extra benefit that everything is built using local source files, which allows Tracy to provide a source view not only of your application but also the libraries you use.
使用 vcpkg 构建的库通常会提供 PDB 符号文件,即使是发布版本也不例外。使用 vcpkg 获取库还有一个额外的好处,即所有库都是使用本地源文件构建的,这使得 Tracy 不仅能提供应用程序的源视图,还能提供所使用库的源视图。
Unix On Linux information needed for debugging traditionally has been provided by special packages named debuginfo, dbgsym, or similar. You can use them to retrieve symbols, but keep in mind the following:
Unix 在 Linux 上 调试传统上所需的信息由名为 debuginfo、dbgsym 或类似的特殊软件包提供。你可以使用它们来检索符号,但要注意以下几点:
  1. Your distribution has to provide such packages. Not each one does.
    您的发行版必须提供此类软件包。并非每个发行版都提供。
  2. Debug packages are usually stored in a separate repository, which you must manually enable.
    调试软件包通常存储在单独的软件源中,必须手动启用。
  3. You need to install a separate package for each library you want to have symbols for.
    您需要为每个需要符号的库安装单独的软件包。
  4. Debugging information can require large amounts of disk space.
    调试信息可能需要大量磁盘空间。
A modern alternative to installing static debug packages is to use the debuginfod system, which performs on-demand delivery of debugging information across the internet. See https://sourceware.org/elfutils/ Debuginfod.html for more details. Since this new method of symbol delivery is not yet universally supported, you will have to manually enable it, both in your system and in Tracy.
除安装静态调试包外,现代的另一种方法是使用 debuginfod 系统,该系统可在互联网上按需提供调试信息。详情请参见 https://sourceware.org/elfutils/ Debuginfod.html。由于这种新的符号交付方法尚未得到普遍支持,因此必须在系统和 Tracy 中手动启用。
First, make sure your distribution maintains a debuginfod server. Then, install the debuginf od library. You also need to ensure you have appropriately configured which server to access, but distribution maintainers usually provide this. Next, add the TRACY_DEBUGINFOD define to the program you want to profile and link it with libdebuginfod. This will enable network delivery of symbols and source file contents. However, the first run (including after a system update) may be slow to respond until the local debuginfod cache becomes filled.
首先,确保您的发行版维护有 debuginfod 服务器。然后,安装 debuginfod 库。你还需要确保已适当配置了要访问的服务器,但发行版维护者通常会提供这一点。接下来,在要配置的程序中添加 TRACY_DEBUGINFOD 定义,并将其与 libdebuginfod 连接。这样就能通过网络传输符号和源文件内容。不过,首次运行(包括系统更新后)可能会反应缓慢,直到本地 debuginfod 缓存被填满。

3.11.1.2 Using the dbghelp library on Windows
3.11.1.2 在 Windows 上使用 dbghelp 库

While Tracy will try to expand the known symbols list when it encounters a new module for the first time, you may want to be able to do such a thing manually. Or maybe you are using the dbghelp. dll library in some other way in your project, for example, to present a call stack to the user at some point during execution.
虽然 Tracy 会在首次遇到新模块时尝试扩展已知符号列表,但您可能希望能够手动扩展。或者,您可能会在项目中以其他方式使用 dbghelp.dll 库,例如,在执行过程中的某个时刻向用户显示调用堆栈。
Since dbghelp functions are not thread-safe, you must take extra steps to make sure your calls to the Sym* family of functions are not colliding with calls made by Tracy. To do so, perform the following steps:
由于 dbghelp 函数不是线程安全的,因此必须采取额外措施,确保对 Sym* 系列函数的调用不会与 Tracy 的调用相冲突。为此,请执行以下步骤:
  1. Add a TRACY_DBGHELP_LOCK define, with the value set to prefix of lock-handling functions (for example: TRACY_DBGHELP_LOCK=DbgHelp).
    添加 TRACY_DBGHELP_LOCK 定义,并将其值设为锁处理函数的前缀(例如:TRACY_DBGHELP_LOCK=DbgHelp)。
  2. Create a dbghelp lock (i.e., mutex) in your application.
    在应用程序中创建一个 dbghelp 锁(即互斥锁)。
  3. Provide a set of Init, Lock and Unlock functions, including the provided prefix name, which will operate on the lock. These functions must be defined using the C linkage. Notice that there's no cleanup function.
    提供一组初始化、锁定和解锁函数,包括所提供的前缀名,这些函数将对锁进行操作。这些函数必须使用 C 链接来定义。请注意,这里没有清理函数。
  4. Remember to protect access to dbghelp in your code appropriately!
    切记在代码中适当保护对 dbghelp 的访问!
An example implementation of such a lock interface is provided below, as a reference:
下面提供了这样一个锁接口的实现示例,以供参考:
extern "C"
{
static HANDLE dbgHelpLock;
void DbgHelpInit() { dbgHelpLock = CreateMutex(nullptr, FALSE, nullptr); }
void DbgHelpLock() { WaitForSingleObject(dbgHelpLock, INFINITE); }
void DbgHelpUnlock() { ReleaseMutex(dbgHelpLock); }
}
At initilization time, tracy will attempt to preload symbols for device drivers and process modules. As this process can be slow when a lot of pdbs are involved, you can set the TRACY_NO_DBGHELP_INIT_LOAD environment variable to "1" to disable this behavior and rely on-demand symbol loading.
在启动时,tracy 会尝试为设备驱动程序和进程模块预加载符号。当涉及大量 pdbs 时,这一过程可能会很慢,因此可以将 TRACY_NO_DBGHELP_INIT_LOAD 环境变量设置为 "1",以禁用这一行为并依赖按需加载符号。

3.11.1.3 Disabling resolution of inline frames
3.11.1.3 禁用内联帧的分辨率

Inline frames retrieval on Windows can be multiple orders of magnitude slower than just performing essential symbol resolution. This manifests as profiler seemingly being stuck for a long time, having hundreds of thousands of query backlog entries queued, which are slowly trickling down. If your use case requires speed of operation rather than having call stacks with inline frames included, you may define the TRACY_NO_CALLSTACK_INLINES macro, which will make the profiler stick to the basic but fast frame resolution mode.
Windows 上的内联框架检索可能比仅执行基本符号解析慢几个数量级。这表现为剖析器似乎卡了很长时间,有成千上万的查询积压条目在排队,这些条目会慢慢减少。如果您的用例要求的是运行速度,而不是包含内联帧的调用堆栈,您可以定义 TRACY_NO_CALLSTACK_INLINES 宏,这将使剖析器坚持使用基本但快速的帧解析模式。

3.11.1.4 Offline symbol resolution
3.11.1.4 脱机符号分辨率

By default, tracy client resolves callstack symbols in a background thread at runtime. This process requires that tracy client load symbols for the shared libraries involved, which requires additial memory allocations, and potential impact runtime performance if a lot of symbol queries are involved. As an alternative to to runtime symbol resolution, we can set the environment variable TRACY_SYMBOL_OFFLINE_RESOLVE to 1 and instead have tracy client only resolve the minimal set of info required for offline resolution (a shared library path and an offset into that shared library).
默认情况下,运行时 tracy 客户端会在后台线程中解析调用堆栈符号。这一过程要求 tracy 客户端为相关共享库加载符号,这需要额外的内存分配,如果涉及大量符号查询,可能会影响运行时性能。作为运行时符号解析的替代方法,我们可以将环境变量 TRACY_SYMBOL_OFFLINE_RESOLVE 设为 1,而让 tracy 客户端只解析离线解析所需的最小信息集(共享库路径和该共享库的偏移量)。
The generated tracy capture will have callstack frames symbols showing [unresolved]. The update tool can be used to load that capture, perform symbol resolution offline (by passing ) and writing out a new capture with symbols resolved. By default update will use the original shared libraries paths that were recorded in the capture (which assumes running in the same machine or a machine with identical filesystem setup as the one used to run the tracy instrumented application). You can do path substitution with the -p option to perform any number of path substitions in order to use symbols located elsewhere.
生成的 tracy 捕获将显示调用堆栈框架符号 [未解决]。更新工具可用于加载捕获,离线执行符号解析(通过传递 ),并写出已解析符号的新捕获。默认情况下,更新将使用捕获中记录的原始共享库路径(假定在同一台机器上运行,或与运行 tracy 仪器应用程序的机器具有相同的文件系统设置)。您可以使用 -p 选项进行路径替换,执行任意数量的路径替换,以使用位于其他位置的符号。
Important 重要
Beware that update will use any matching symbol file to the path it resolved to (no symbol version checking is done), so if the symbol file doesn't match the code that was used when doing the callstack capturing you will get incorrect results.
请注意,更新将使用与其解析路径相匹配的任何符号文件(不进行符号版本检查),因此如果符号文件与进行调用栈捕获时使用的代码不匹配,就会得到不正确的结果。
Also note that in the case of using offline symbol resolving, even after running the update tool to resolve symbols, the symbols statistics are not updated and will still report the unresolved symbols.
还要注意的是,在使用离线符号解析的情况下,即使运行更新工具解析了符号,符号统计数据也不会更新,仍会报告未解析的符号。

3.12 Lua support 3.12 Lua 支持

To profile Lua code using Tracy, include the public/tracy/TracyLua.hpp header file in your Lua wrapper and execute tracy::LuaRegister(lua_State*) function to add instrumentation support.
要使用 Tracy 配置 Lua 代码,请在 Lua 封装中包含 public/tracy/TracyLua.hpp 头文件,并执行 tracy::LuaRegister(lua_State*) 函数来添加仪表支持。
In the Lua code, add tracy. ZoneBegin() and tracy. ZoneEnd() calls to mark execution zones. You need to call the ZoneEnd method because there is no automatic destruction of variables in Lua, and we don't know when the garbage collection will be performed. Double check if you have included all return paths!
在 Lua 代码中,添加 tracy.ZoneBegin() 和 tracy.ZoneEnd() 调用,以标记执行区域。您需要调用 ZoneEnd 方法,因为在 Lua 中没有自动销毁变量的功能,而且我们也不知道何时会执行垃圾回收。请仔细检查是否包含了所有返回路径!
Use tracy. ZoneBeginN (name) if you want to set a custom zone name .
使用 tracy.如果要设置自定义区域名称 ,请使用 ZoneBeginN (name) 。
Use tracy.ZoneText (text) to set zone text.
使用 tracy.ZoneText(文本)设置区域文本。
Use tracy.Message (text) to send messages.
使用 tracy.Message(文本)发送信息。
Use tracy. ZoneName (text) to set zone name on a per-call basis.
使用 tracy.ZoneName (文本)来按呼叫设置区域名称。
Lua instrumentation needs to perform additional work (including memory allocation) to store source location. This approximately doubles the data collection cost.
Lua 仪器需要执行额外的工作(包括内存分配)来存储源位置。这大约会使数据收集成本增加一倍。

3.12.1 Call stacks 3.12.1 调用堆栈

To collect Lua call stacks (see section 3.11), replace tracy. ZoneBegin() calls with tracy. ZoneBeginS (depth), and tracy.ZoneBeginN (name) calls with tracy.ZoneBeginNS(name, depth). Using the TRACY_CALLSTACK macro automatically enables call stack collection in all zones.
要收集 Lua 调用堆栈(参见第 3.11 节),请将 tracy.ZoneBegin()调用替换为 tracy.ZoneBeginS(深度)调用。ZoneBeginS(深度)的调用,以及 tracy.ZoneBeginNS(name, depth) 的调用。使用 TRACY_CALLSTACK 宏可自动启用所有区域的调用堆栈收集。
Be aware that for Lua call stack retrieval to work, you need to be on a platform that supports the collection of native call stacks.
请注意,要使 Lua 调用堆栈检索正常工作,您所使用的平台必须支持本地调用堆栈的收集。
Cost of performing Lua call stack capture is presented in table 6 and figure 8. Lua call stacks include native call stacks, which have a capture cost of their own (table 5), and the depth parameter is applied for both captures. The presented data were captured with full Lua stack depth, but only 13 frames were available on the native call stack. Hence, to explain the non-linearity of the graph, you need to consider what was truly measured:
表 6 和图 8 列出了执行 Lua 调用栈捕获的成本。Lua 调用栈包括本地调用栈,本地调用栈本身也有捕获成本(表 5),深度参数适用于两种捕获。所提供的数据是以全部 Lua 栈深度捕获的,但本地调用栈中只有 13 帧可用。因此,要解释图表的非线性,需要考虑真正测量的内容:

3.12.2 Instrumentation cleanup
3.12.2 仪器清理

Even if Tracy is disabled, you still have to pay the no-op function call cost. To prevent that, you may want to use the tracy::LuaRemove (char* script) function, which will replace instrumentation calls with white-space. This function does nothing if the profiler is enabled.
即使禁用了 Tracy,仍需支付无操作函数调用的费用。为避免出现这种情况,您可能需要使用 tracy::LuaRemove (char* script) 函数,该函数将用空格替换工具调用。如果启用了剖析器,该函数将不起任何作用。

3.13 C API 3.13 C 应用程序接口

To profile code written in C programming language, you will need to include the public/tracy/TracyC.h header file, which exposes the C API.
要对用 C 语言编写的代码进行剖析,需要包含 public/tracy/TracyC.h 头文件,该文件公开了 C API。
At the moment, there's no support for C API based markup of locks, GPU zones, or Lua.
目前,还不支持基于 C API 的锁标记、GPU 区域或 Lua。
Depth 深度 Time 时间
1 707 ns
2 699 ns
3 624 ns
4 727 ns
5 836 ns
10
15
20
25
30
35
40
45
50
55
60
Table 6: Median times of Lua zone capture with call stack ( native frames)
表 6:带调用栈的 Lua 区域捕获中位时间( 本地帧)

Important 重要

Tracy is written in , so you will need to have a compiler and link with standard library, even if your program is strictly pure C.
Tracy 是用 编写的,因此即使你的程序是纯 C 语言,也需要使用 编译器并与 标准库链接。

3.13.1 Setting thread names
3.13.1 设置线程名称

To set thread names (section 2.4) using the C API you should use the TracyCSetThreadName (name) macro.
要使用 C API 设置线程名称(第 2.4 节),应使用 TracyCSetThreadName (name) 宏。

3.13.2 Frame markup 3.13.2 框架标记

To mark frames, as described in section 3.3, use the following macros:
如第 3.3 节所述,要标记帧,请使用以下宏:
  • TracyCFrameMark
  • TracyCFrameMarkNamed (name)
    TracyCFrameMarkNamed (名称)
  • TracyCFrameMarkStart (name)
    TracyCFrameMarkStart (名称)
  • TracyCFrameMarkEnd(name)
  • TracyCFrameImage(image, width, height, offset, flip)
    TracyCFrameImage(图像、宽度、高度、偏移、翻转)

3.13.3 Zone markup 3.13.3 区域标记

The following macros mark the beginning of a zone:
以下宏标记一个区段的起始位置:
  • TracyCZone(ctx, active)
  • TracyCZoneN(ctx, name, active)
  • TracyCZoneC(ctx, color, active)
  • TracyCZoneNC(ctx, name, color, active)
Figure 8: Plot of call Lua stack capture times (see table 6)
图 8:调用 Lua 栈捕获时间图(见表 6)
Refer to sections 3.4 and 3.4 .2 for description of macro variants and parameters. The ctx parameter specifies the name of a data structure, which the macro will create on the stack to hold the internal zone data.
有关宏变体和参数的说明,请参阅第 3.4 和 3.4.2 节。ctx 参数指定数据结构的名称,宏将在堆栈上创建该数据结构,用于保存内部区域数据。
Unlike C++, there's no automatic destruction mechanism in C, so you will need to mark where the zone ends manually. To do so use the TracyCZoneEnd (ctx) macro.
与 C++ 不同,C 语言中没有自动销毁机制,因此需要手动标记区域的结束位置。为此,请使用 TracyCZoneEnd (ctx) 宏。
Zone text and name may be set by using the TracyCZoneText (ctx, txt, size), TracyCZoneValue (ctx, value) and TracyCZoneName (ctx, txt, size) macros. Make sure you are following the zone stack rules, as described in section 3.4.2!
可使用 TracyCZoneText (ctx, txt, size)、TracyCZoneValue (ctx, value) 和 TracyCZoneName (ctx, txt, size) 宏设置区文本和名称。请确保您遵循第 3.4.2 节所述的区段堆栈规则!

3.13.3.1 Zone context data structure
3.13.3.1 区域上下文数据结构

In typical use cases the zone context data structure is hidden from your view, requiring only to specify its name for the TracyCZone and TracyCZoneEnd macros. However, it is possible to use it in advanced scenarios, for example, if you want to start a zone in one function, then end it in another one. To do so, you will need to forward the data structure either through a function parameter or as a return value or place it in a thread-local stack structure. To accomplish this, you need to keep in mind the following rules:
在典型的使用案例中,区段上下文数据结构是隐藏的,只需为 TracyCZone 和 TracyCZoneEnd 宏指定其名称即可。不过,在高级情况下也可以使用它,例如,如果你想在一个函数中启动一个区,然后在另一个函数中结束它。为此,您需要通过函数参数或作为返回值转发数据结构,或者将其放入线程本地堆栈结构中。为此,您需要牢记以下规则:
  • The created variable name is exactly what you pass as the ctx parameter.
    创建的变量名正是您作为 ctx 参数传递的变量名。
  • The data structure is of an opaque, immutable type TracyCZoneCtx.
    该数据结构属于不透明、不可变类型 TracyCZoneCtx。
  • Contents of the data structure can be copied by assignment. Do not retrieve or use the structure's address - this is asking for trouble.
    数据结构的内容可以通过赋值复制。切勿检索或使用结构体的地址,否则会自找麻烦。
  • You must use the data structure (or any of its copies) exactly once to end a zone.
    必须准确地使用一次数据结构(或其任何副本)才能结束一个区域。
  • Zone must end in the same thread in which it was started.
    区域必须在开始的同一主题中结束。

3.13.3.2 Zone validation
3.13.3.2 区验证

Since all C API instrumentation has to be done by hand, it is possible to miss some code paths where a zone should be started or ended. Tracy will perform additional validation of instrumentation correctness to prevent bad profiling runs. Read section 4.7 for more information.
由于所有的 C API 工具都必须手工完成,因此可能会遗漏一些代码路径,而这些代码路径应该是区域开始或结束的地方。Tracy 会对工具的正确性进行额外验证,以防止出现错误的剖析运行。更多信息请阅读第 4.7 节。
However, the validation comes with a performance cost, which you may not want to pay. Therefore, if you are entirely sure that the instrumentation is not broken in any way, you may use the TRACY_NO_VERIFY macro, which will disable the validation code.
不过,验证需要付出性能代价,您可能不愿意支付这种代价。因此,如果完全确定仪器没有任何故障,可以使用 TRACY_NO_VERIFY 宏来禁用验证代码。

3.13.3.3 Transient zones in C API
3.13.3.3 C 应用程序接口中的瞬态区

There is no explicit support for transient zones (section 3.4.4) in the C API macros. However, this functionality can be implemented by following instructions outlined in section 3.13.11.
C API 宏并不明确支持瞬态区(第 3.4.4 节)。不过,可以按照第 3.13.11 节中的说明实现这一功能。

3.13.4 Lock markup 3.13.4 锁定标记

Marking locks in the C API is done with the following macros:
在 C 应用程序接口中,使用以下宏来标记锁:
  • TracyCLockAnnounce(lock_ctx)
  • TracyCLockTerminate(lock_ctx)
  • TracyCLockBeforeLock(lock_ctx)
  • TracyCLockAfterLock(lock_ctx)
  • TracyCLockAfterUnlock(lock_ctx)
  • TracyCLockAfterTryLock(lock_ctx, acquired)
  • TracyCLockMark(lock_ctx)
  • TracyCLockCustomName(lock_ctx, name, size)
Additionally a lock context has to be defined next to the lock that it will be marking:
此外,还必须在要标记的锁旁边定义锁上下文:
TracyCLockCtx tracy_lock_ctx;
HANDLE lock;
To initialize the lock context use TracyCLockAnnounce, this should be done when the lock you are marking is initialized/created. When the lock is destroyed use TracyCLockTerminate, this will free the lock context. You can use the TracyCLockCustomName macro to name a lock.
要初始化锁上下文,请使用 TracyCLockAnnounce,这应在标记的锁初始化/创建时进行。当锁被销毁时,使用 TracyCLockTerminate,这将释放锁上下文。你可以使用 TracyCLockCustomName 宏来命名锁。
You must markup both before and after acquiring a lock:
您必须在获取锁定之前和之后都进行标记:
TracyCLockBeforeLock(tracy_lock_ctx);
WaitForSingleObject(lock, INFINITE);
TracyCLockAfterLock(tracy_lock_ctx);
If acquiring the lock may fail, you should instead use the TracyCLockAfterTryLock macro:
如果获取锁可能失败,则应使用 TracyCLockAfterTryLock 宏:
TracyCLockBeforeLock(tracy_lock_ctx);
int acquired = WaitForSingleObject(lock, 200) == WAIT_OBJECT_0;
TracyCLockAfterTryLock(tracy_lock_ctx, acquired);
After you release the lock use the TracyCLockAfterUnlock macro:
释放锁定后,使用 TracyCLockAfterUnlock 宏:
ReleaseMutex(lock);
TracyCLockAfterUnlock(tracy_lock_ctx);
You can optionally mark the location of where the lock is held by using the TracyCLockMark macro, this should be done after acquiring the lock.
您可以选择使用 TracyCLockMark 宏来标记锁定的位置,这应该在获取锁定后进行。

3.13.5 Memory profiling 3.13.5 内存剖析

Use the following macros in your implementations of malloc and free:
在实现 malloc 和 free 时,请使用以下宏:
  • TracyCAlloc(ptr, size)
  • TracyCFree(ptr)
  • TracyCSecureAlloc(ptr, size)
  • TracyCSecureFree(ptr)
Correctly using this functionality can be pretty tricky. You also will need to handle all the memory allocations made by external libraries (which typically allow usage of custom memory allocation functions) and the allocations made by system functions. If you can't track such an allocation, you will need to make sure freeing is not reported .
正确使用这一功能可能相当棘手。您还需要处理所有由外部库(通常允许使用自定义内存分配函数)分配的内存以及由系统函数分配的内存。如果无法跟踪此类分配,则需要确保不报告 释放。
There is no explicit support for realloc function. You will need to handle it by marking memory allocations and frees, according to the system manual describing the behavior of this routine.
realloc 函数没有明确的支持。您需要根据描述此例程行为的系统手册,通过标记内存分配和释放来处理它。
Memory pools (section 3.8.1) are supported through macros with N postfix.
内存池(第 3.8.1 节)通过带 N 后缀的宏来支持。
For more information about memory profiling, refer to section 3.8.
有关内存剖析的更多信息,请参阅第 3.8 节。

3.13.6 Plots and messages
3.13.6 绘图和信息

To send additional markup in form of plot data points or messages use the following macros:
要以绘图数据点或信息的形式发送附加标记,请使用以下宏:
  • TracyCPlot(name, val)
  • TracyCPlotF(name, val)
  • TracyCPlotI(name, val)
  • TracyCMessage(txt, size)
  • TracyCMessageL(txt)
  • TracyCMessageC(txt, size, color)
  • TracyCMessageLC(txt, color)
  • TracyCAppInfo(txt, size)
Consult sections 3.6 and 3.7 for more information.
更多信息请参见第 3.6 和 3.7 节。

3.13.7 GPU zones 3.13.7 GPU 区域

Hooking up support for GPU zones requires a bit more work than usual. The C API provides a low-level interface that you can use to submit the data, but there are no facilities to help you with timestamp processing.
连接对 GPU 区域的支持需要比平时多做一些工作。C 应用程序接口提供了一个底层接口,您可以用它来提交数据,但没有任何工具可以帮助您处理时间戳。
Moreover, there are two sets of functions described below. The standard set sends data asynchronously, while the _serial one ensures proper ordering of all events, regardless of the originating thread. Generally speaking, you should be using the asynchronous functions only in the case of strictly single-threaded APIs, like OpenGL.
此外,还有以下两组函数。标准函数集以异步方式发送数据,而 _serial 函数集则确保所有事件的正确排序,而与发起线程无关。一般来说,只有在使用严格的单线程应用程序接口(如 OpenGL)时,才能使用异步函数。
A GPU context can be created with the tracy_emit_gpu_new_context function (or the serialized variant). You'll need to specify:
GPU 上下文可以使用 tracy_emit_gpu_new_context 函数(或序列化变体)创建。您需要指定

- context - a unique context id.
- context - 唯一的上下文 ID。

- gpuTime - an initial GPU timestamp. - period - the timestamp period of the GPU.
- gpuTime - 初始 GPU 时间戳。- period - GPU 的时间戳周期。

- flags - the flags to use.
- flags - 要使用的标志。

- type - the GPU context type.
- type - GPU 上下文类型。
GPU contexts can be named using the ___tracy_emit_gpu_context_name function.
GPU 上下文可以使用 ___tracy_emit_gpu_context_name 函数命名。
GPU zones can be created with the ___tracy_emit_gpu_zone_begin_alloc function. The srcloc parameter is the address of the source location data allocated via ___tracy_alloc_srcloc or ___tracy_alloc_srcloc_name. The queryId parameter is the id of the corresponding timestamp query. It should be unique on a per-frame basis.
GPU 区域可以通过___tracy_emit_gpu_zone_begin_alloc 函数创建。srcloc 参数是通过 ___tracy_alloc_srcloc 或 ___tracy_alloc_srcloc_name 分配的源位置数据的地址。queryId 参数是相应时间戳查询的 id。每个帧都应是唯一的。
GPU zones are ended via ___tracy_emit_gpu_zone_end.
GPU 区域通过 ___tracy_emit_gpu_zone_end 结束。
When the timestamps are fetched from the GPU, they must then be emitted via the ___tracy_emit_gpu_time function. After all timestamps for a frame are emitted, queryIds may be re-used.
从 GPU 获取时间戳后,必须通过___tracy_emit_gpu_time 函数发射这些时间戳。在帧的所有时间戳都发出后,queryIds 可以被重新使用。
CPU and GPU timestamps may be periodically resynchronized via the ___tracy_emit_gpu_time_sync function, which takes the GPU timestamp closest to the moment of the call. This can help with timestamp drift and work around compounding GPU timestamp overflowing. Note that this requires CPU and GPU synchronization, which will block execution of your application. Do not do this every frame.
CPU 和 GPU 时间戳可以通过 ___tracy_emit_gpu_time_sync 函数定期重新同步,该函数获取最接近调用时刻的 GPU 时间戳。这有助于解决时间戳漂移问题,并解决 GPU 时间戳溢出的问题。需要注意的是,这需要 CPU 和 GPU 同步,这会阻塞应用程序的执行。不要每帧都这样做。
To see how you should use this API, you should look at the reference implementation contained in API-specific C++ headers provided by Tracy. For example, to see how to write your instrumentation of OpenGL, you should closely follow the contents of the TracyOpenGL. hpp implementation.
要了解如何使用此 API,应查看 Tracy 提供的 API 专用 C++ 头文件中包含的参考实现。例如,要了解如何编写 OpenGL 仪器,应严格遵循 TracyOpenGL.

3.13.8 Fibers 3.13.8 纤维

Fibers are available in the C API through the TracyCFiberEnter and TracyCFiberLeave macros. To use them, you should observe the requirements listed in section 3.10.
纤维可通过 TracyCFiberEnter 和 TracyCFiberLeave 宏在 C API 中使用。要使用它们,必须遵守第 3.10 节中列出的要求。

3.13.9 Connection Status
3.13.9 连接状态

To query the connection status (section 3.18) using the C API you should use the TracyCIsConnected macro.
要使用 C API 查询连接状态(第 3.18 节),应使用 TracyCIsConnected 宏。

3.13.10 Call stacks 3.13.10 调用堆栈

You can collect call stacks of zones and memory allocation events, as described in section 3.11, by using macros with S postfix, such as: TracyCZoneS, TracyCZoneNS, TracyCZoneCS, TracyCZoneNCS, TracyCAllocS, TracyCFreeS, and so on.
如第 3.11 节所述,您可以使用带 S 后缀的宏来收集区域和内存分配事件的调用堆栈,例如TracyCZoneS、TracyCZoneNS、TracyCZoneCS、TracyCZoneNCS、TracyCAllocS、TracyCFreeS 等。

3.13.11 Using the C API to implement bindings
3.13.11 使用 C API 实现绑定

Tracy C API exposes functions with the ___tracy prefix that you may use to write bindings to other programming languages. Most of the functions available are a counterpart to macros described in section 3.13. However, some functions do not have macro equivalents and are dedicated expressly for binding implementation purposes. This includes the following:
Tracy C API 公开了以 ___tracy 为前缀的函数,您可以使用这些函数编写与其他编程语言的绑定。大多数可用函数与第 3.13 节中描述的宏相对应。但是,有些函数没有对应的宏,而是专门用于实现绑定。这些函数包括
tracy_startup_profiler(void)
  • ___tracy_shutdown_profiler(void)
  • ___tracy_alloc_srcloc(uint32_t line, const char* source, size_t sourceSz, const char* function, size_t functionSz)
  • _-_tracy_alloc_srcloc_name(uint32_t line, const char* source, size_t sourceSz, const char* function, size_t functionSz, const char* name, size_t nameSz)
Here line is line number in the source source file and function is the name of a function in which the zone is created. sourceSz and functionSz are the size of the corresponding string arguments in bytes. You may additionally specify an optional zone name, by providing it in the name variable, and specifying its size in nameSz.
sourceSz 和 functionSz 是相应字符串参数的大小(以字节为单位)。您还可以指定一个可选的区域名称,方法是在 name 变量中提供该名称,并在 nameSz 中指定其大小。
The ___tracy_alloc_srcloc and ___tracy_alloc_srcloc_name functions return an uint64_t source location identifier corresponding to an allocated source location. As these functions do not require the provided string data to be available after they return, the calling code is free to deallocate them at any time afterward. This way, the string lifetime requirements described in section 3.1 are relaxed.
函数 ___tracy_alloc_srcloc 和 ___tracy_alloc_srcloc_name 返回与已分配源位置相对应的 uint64_t 源位置标识符。由于这些函数不要求所提供的字符串数据在函数返回后仍然可用,因此调用代码可以在函数返回后的任何时间自由地取消分配。这样,第 3.1 节中描述的字符串生命周期要求就被放宽了。
The uint64_t return value from allocation functions must be passed to one of the zone begin functions:
分配函数的 uint64_t 返回值必须传递给其中一个区域开始函数:
  • ___tracy_emit_zone_begin_alloc(srcloc, active)
  • ___tracy_emit_zone_begin_alloc_callstack(srcloc, depth, active)
These functions return a TracyCZoneCtx context value, which must be handled, as described in sections 3.13.3 and 3.13.3.1.
这些函数会返回一个 TracyCZoneCtx 上下文值,必须按照第 3.13.3 和 3.13.3.1 节所述进行处理。
The variable representing an allocated source location is of an opaque type. After it is passed to one of the zone begin functions, its value cannot be reused (the variable is consumed). You must allocate a new source location for each zone begin event, even if the location data would be the same as in the previous instance.
代表已分配源位置的变量属于不透明类型。在将其传递给某个区域开始函数后,其值将无法重复使用(变量已消耗)。您必须为每个区域开始事件分配一个新的源位置,即使位置数据与前一个实例相同。

Important 重要

Since you are directly calling the profiler functions here, you will need to take care of manually disabling the code if the TRACY_ENABLE macro is not defined.
由于此处直接调用剖析器函数,因此如果未定义 TRACY_ENABLE 宏,则需要手动禁用代码。

3.14 Python API

To profile Python code using Tracy, a Python package can be built. This is done using the excellent C++11 based Python bindings generator pybind11, see https://pybind11.readthedocs.io. As a first step, a Tracy-Client shared library needs to be built (with the compile definitions you want to use) and then pybind11 is used to create the Python-bindings. Afterwards, a Python c-extension package can be created (the package will be platform and Python version dependent).
要使用 Tracy 对 Python 代码进行剖析,可以构建一个 Python 包。这可以使用基于 C++11 的优秀 Python 绑定生成器 pybind11 来完成,参见 https://pybind11.readthedocs.io。第一步,需要构建一个 Tracy-Client 共享库(包含要使用的编译定义),然后使用 pybind11 创建 Python 绑定。之后,可以创建 Python c-extension 包(该包取决于平台和 Python 版本)。
An especially powerful feature is the ability to profile Python code and any other C/C++ code used in a single code base as long as the C/C++ code links to the same shared Tracy-Client library that is installed with the Python package.
一个特别强大的功能是,只要 C/C++ 代码链接到与 Python 软件包安装在一起的 Tracy-Client 共享库,就能对 Python 代码和单个代码库中使用的任何其他 C/C++ 代码进行剖析。

3.14.1 Bindings 3.14.1 绑定

An example of how to use the Tracy-Client bindings is shown below:
下面举例说明如何使用 Tracy-Client 绑定:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from time import sleep
import numpy as np
import tracy_client as tracy
tracy.ScopedFrameDecorator("framed")tracy.ScopedZoneDecorator(name="work", color=tracy .
    ColorType.Red4)
def work():
    sleep(0.05)
def main():
    assert tracy.program_name("MyApp")
    assert tracy.app_info("this is a python app")
    tracy.thread_name("python") # main thread so bit useless
    plot_id = tracy.plot_config("plot", tracy.PlotFormatType.Number)
    assert plot_id is not None
    mem_id = None
    index = 0
    while True:
        with tracy.ScopedZone(name="test", color=tracy.ColorType.Coral) as zone:
            index += 1
            tracy.frame_mark()
            inner = tracy.ScopedZone(depth=5, color=tracy.ColorType.Coral)
            inner.color(index % 5)
            inner.name(str(index))
            inner.enter()
            if index % 2:
                tracy.alloc(44, index)
        else:
            tracy.free(44)
        if not index % 2:
            if mem_id is None:
                mem_id = tracy.alloc(1337000000, index, name="named", depth=4)
                assert mem_id is not None
            else:
                tracy.alloc(1337000000, index, id=mem_id, depth=4)
            else:
                tracy.free(1337000000, mem_id, 4)
            with tracy.ScopedFrame("custom"):
                image = np.full([400, 400, 4], index, dtype=np.uint8)
                assert tracy.frame_image(image.tobytes(), 400, 400)
                inner.exit()
                zone.text(index)
                assert tracy.message(f"we are at index {index}")
                assert tracy.message(f"we are at index {index}", tracy.ColorType.Coral)
                assert tracy.plot(plot_id, index)
                work()
                sleep(0.1)
if __name__ == "__main__":
    main()
Please not the use of ids as way to cope with the need for unique pointers for certain features of the Tracy profiler, see section 3.1.2.
请勿使用 id 来满足对 Tracy 剖析器某些功能的唯一指针的需求,参见第 3.1.2 节。

3.14.2 Building the Python package
3.14.2 构建 Python 软件包

To build the Python package, you will need to use the CMake build system to compile the Tracy-Client. The CMake option -D TRACY_CLIENT_PYTHON=ON is used to enable the generation of the Python bindings in conjunction with a mandatory creation of a shared Tracy-Client library via one of the CMake options -D BUILD_SHARED_LIBS=ON or -D DEFAULT_STATIC=OFF.
要编译 Python 软件包,需要使用 CMake 编译系统来编译 Tracy-Client。CMake 选项 -D TRACY_CLIENT_PYTHON=ON 用于生成 Python 绑定,并通过 CMake 选项 -D BUILD_SHARED_LIBS=ON 或 -D DEFAULT_STATIC=OFF 之一强制创建 Tracy-Client 共享库。
The following other variables are available in addition:
此外,还可以使用以下其他变量:
  • EXTERNAL_PYBIND11 - Can be used to disable the download of pybind11 when Tracy is embedded in another CMake project that already uses pybind11.
    EXTERNAL_PYBIND11 - 当 Tracy 嵌入另一个已使用 pybind11 的 CMake 项目时,可用于禁止下载 pybind11。
  • TRACY_CLIENT_PYTHON_TARGET - Optional directory to copy Tracy Python bindings to when Tracy is embedded in another CMake project.
    TRACY_CLIENT_PYTHON_TARGET - 可选目录,当 Tracy 嵌入另一个 CMake 项目时,将 Tracy Python 绑定复制到该目录。
  • BUFFER_SIZE - The size of the global pointer buffer (defaults to 128) for naming Tracy profiling entities like frame marks, plots, and memory locations.
    BUFFER_SIZE - 全局指针缓冲区的大小(默认为 128),用于命名帧标记、绘图和内存位置等 Tracy 剖析实体。
  • NAME_LENGTH - The maximum length (defaults to 128) of a name stored in the global pointer buffer.
    NAME_LENGTH - 全局指针缓冲区中存储的名称的最大长度(默认为 128)。
Be aware that the memory allocated by this buffer is global and is not freed, see section 3.1.2.
请注意,该缓冲区分配的内存是全局内存,不会被释放,参见第 3.1.2 节。
See below for example steps to build the Python bindings using CMake:
有关使用 CMake 构建 Python 绑定的示例步骤,请参阅下文:
mkdir build
cd build
cmake -DTRACY_STATIC=OFF -DTRACY_CLIENT_PYTHON=ON ../
Once this has finished building the Python package can be built as follows:
完成构建后,Python 软件包的构建过程如下:
cd ../python
python3 setup.py bdist_wheel
The created package will be in the folder python/dist.
创建的软件包将放在 python/dist 文件夹中。

3.15 Automated data collection
3.15 自动数据收集

Tracy will perform an automatic collection of system data without user intervention. This behavior is platform-specific and may not be available everywhere. Refer to section 2.6 for more information.
Tracy 会自动收集系统数据,无需用户干预。这种行为因平台而异,可能无法在所有地方使用。更多信息请参阅第 2.6 节。

3.15.1 Privilege elevation
3.15.1 提升权限

Some profiling data can only be retrieved using the kernel facilities, which are not available to users with normal privilege level. To collect such data, you will need to elevate your rights to the administrator level. You can do so either by running the profiled program from the root account on Unix or through the Run as administrator option on Windows . On Android, you will need to have a rooted device (see section 2.1.9.4 for additional information).
某些剖析数据只能通过内核设施获取,普通权限级别的用户无法使用这些设施。要收集此类数据,需要将权限提升到管理员级别。为此,您可以在 Unix 上使用 root 账户运行剖析程序,或在 Windows 上使用 "以管理员身份运行 "选项 。在 Android 上,您需要 root 设备(更多信息请参见第 2.1.9.4 节)。
As this system-level tracing functionality is part of the automated collection process, no user intervention is necessary to enable it (assuming that the program was granted the rights needed). However, if, for some reason, you would want to prevent your application from trying to access kernel data, you may recompile your program with the TRACY_NO_SYSTEM_TRACING define. If you want to disable this functionality dynamically at runtime instead, you can set the TRACY_NO_SYSTEM_TRACING environment variable to "1".
由于这种系统级跟踪功能是自动收集程序的一部分,因此无需用户干预即可启用(假设程序已获得所需的权限)。不过,如果出于某种原因,您希望阻止应用程序尝试访问内核数据,可以使用 TRACY_NO_SYSTEM_TRACING 定义重新编译程序。如果想在运行时动态禁用这一功能,可以将 TRACY_NO_SYSTEM_TRACING 环境变量设置为 "1"。

What should be granted privileges?
什么应该被授予特权?

Sometimes it may be confusing which program should be given admin access. After all, some other profilers have to run elevated to access all their capabilities.
有时,哪个程序应该获得管理员权限可能会让人感到困惑。毕竟,其他一些剖析器必须升高运行级别才能访问其所有功能。
In the case of Tracy, you should give the administrative rights to the profiled application. Remember that the server part of the profiler (where the data is collected and displayed) may be running on another machine, and thus you can't use it to access kernel data.
如果是 Tracy,则应赋予被剖析应用程序管理权限。请记住,剖析器的服务器部分(收集和显示数据的地方)可能运行在另一台机器上,因此你不能用它来访问内核数据。

3.15.2 CPU usage 3.15.2 CPU 使用率

System-wide CPU load is gathered with relatively high granularity (one reading every 100 ms ). The readings are available as a plot (see section 5.2.3.3). Note that this parameter considers all applications running on the system, not only the profiled program.
全系统 CPU 负载的收集粒度相对较高(每 100 毫秒一个读数)。读数以图表形式显示(参见第 5.2.3.3 节)。请注意,该参数考虑的是系统上运行的所有应用程序,而不仅仅是剖析程序。

3.15.3 Context switches 3.15.3 上下文切换

Since the profiled program is executing simultaneously with other applications, you can't have exclusive access to the CPU. Instead, the multitasking operating system's scheduler gives threads waiting to execute short time slices to do part of their work. Afterward, threads are preempted to give other threads a chance to run. This ensures that each program running in the system has a fair environment, and no program can hog the system resources for itself.
由于剖析程序与其他应用程序同时执行,因此无法独占 CPU。相反,多任务操作系统的调度程序会为等待执行的线程提供较短的时间片来完成部分工作。之后,线程会被抢占,让其他线程有机会运行。这确保了系统中运行的每个程序都有一个公平的环境,任何程序都不能独占系统资源。
As a corollary, it is often not enough to know how long it took to execute a zone. For example, the thread in which a zone was running might have been suspended by the system. This would have artificially increased the time readings.
由此推论,仅仅知道执行一个区所需的时间往往是不够的。例如,运行区域的线程可能已被系统暂停。这会人为地增加读取的时间。
To solve this problem, Tracy collects context switch information. This data can then be used to see when a zone was in the executing state and where it was waiting to be resumed.
为了解决这个问题,Tracy 收集了上下文切换 信息。然后,就可以利用这些数据来查看区域何时处于执行状态,以及在哪里等待恢复。
You may disable context switch data capture by adding the TRACY_NO_CONTEXT_SWITCH define to the client. Since with this feature you are observing other programs, you can only use it after privilege elevation, which is described in section 3.15.1.
您可以通过在客户端添加 TRACY_NO_CONTEXT_SWITCH 定义来禁用上下文切换数据捕获功能。由于该功能会影响其他程序的运行,因此只能在权限提升后才能使用,详情请参阅第 3.15.1 节。

3.15.4 CPU topology 3.15.4 CPU 拓扑

Tracy may discover CPU topology data to provide further information about program performance characteristics. It is handy when combined with context switch information (section 3.15.3).
Tracy 可能会发现 CPU 拓扑数据,从而提供有关程序性能特征的更多信息。如果与上下文切换信息(第 3.15.3 节)结合使用,将非常方便。
In essence, the topology information gives you context about what any given logical really is and how it relates to other logical CPUs. The topology hierarchy consists of packages, cores, and threads.
从本质上讲,拓扑信息可让您了解任何给定逻辑 的真实情况,以及它与其他逻辑 CPU 的关系。拓扑层次结构包括包、内核和线程。
Packages contain cores and shared resources, such as memory controller, L3 cache, etc. A store-bought CPU is an example of a package. While you may think that multi-package configurations would be a domain of servers, they are actually quite common in the mobile devices world, with many platforms using the big.LITTLE arrangement of two packages in one silicon chip.
软件包包含内核和共享资源,如内存控制器、L3 高速缓存等。商店购买的 CPU 就是一个包的例子。你可能会认为多封装配置是服务器的专利,但实际上,这种配置在移动设备领域非常普遍,许多平台都在一个硅芯片中使用 big.LITTLE 两封装排列。
Cores contain at least one thread and shared resources: execution units, L1 and L2 cache, etc.
内核至少包含一个线程和共享资源:执行单元、一级和二级缓存等。
Threads (or logical CPUs; not to be confused with program threads) are basically the processor instruction pipelines. A pipeline might become stalled, for example, due to pending memory access, leaving core resources unused. To reduce this bottleneck, some CPUs may use simultaneous multithreading , in which more than one pipeline will be using a single physical core resources.
线程(或逻辑 CPU;不要与程序线程混淆)基本上就是处理器的指令流水线。例如,一条流水线可能会因内存访问未决而停滞,导致核心资源闲置。为了减少这种瓶颈,某些 CPU 可能会使用同步多线程 ,在这种情况下,一个以上的流水线将使用一个物理内核资源。
Knowing which package and core any logical CPU belongs to enables many insights. For example, two threads scheduled to run on the same core will compete for shared execution units and cache, resulting in reduced performance. Or, migrating a program thread from one core to another will invalidate the L1 and L2 cache. However, such invalidation is less costly than migration from one package to another, which also invalidates the L3 cache.
了解任何逻辑 CPU 属于哪个软件包和内核,可以获得很多启示。例如,计划在同一内核上运行的两个线程将竞争共享执行单元和高速缓存,从而导致性能下降。或者,将程序线程从一个内核迁移到另一个内核会使 L1 和 L2 缓存失效。不过,这种失效比从一个软件包迁移到另一个软件包的成本要低,因为后者也会使 L3 缓存失效。

Important 重要

In this manual, the word core is typically used as a short term for logical CPU. Please do not confuse it with physical processor cores.
在本手册中,内核一词通常是逻辑 CPU 的简称。请不要将其与物理处理器内核混淆。

3.15.5 Call stack sampling
3.15.5 调用栈取样

Manual markup of zones doesn't cover every function existing in a program and cannot be performed in system libraries or the kernel. This can leave blank spaces on the trace, leaving you no clue what the application was doing. However, Tracy can periodically inspect the state of running threads, providing you with a snapshot of the call stack at the time when sampling was performed. While this information doesn't have the fidelity of manually inserted zones, it can sometimes give you an insight into where to go next.
手动标记区域并不能涵盖程序中存在的所有功能,也无法在系统库或内核中执行。这可能会在跟踪上留下空白,让你无法知道应用程序在做什么。不过,Tracy 可以定期检查运行线程的状态,为你提供采样时调用栈的快照。虽然这种信息的保真度不如手动插入的区域,但有时也能让你了解下一步该往哪里走。
This feature requires privilege elevation on Windows, but not on Linux. However, running as root on Linux will also provide you the kernel stack traces. Additionally, you should review chapter 3.11 to see if you have proper setup for the required program debugging data.
该功能在 Windows 上需要提升权限,但在 Linux 上不需要。不过,在 Linux 上以 root 身份运行也能获得内核堆栈跟踪。此外,您还应复习第 3.11 章,看看是否已为所需的程序调试数据进行了适当设置。
By default, sampling is performed at 8 kHz frequency on Windows (the maximum possible value). On Linux and Android, it is performed at . You can change this value by providing the sampling frequency (in Hz ) through the TRACY_SAMPLING_HZ macro.
默认情况下,Windows 系统的采样频率为 8 kHz(可能的最大值)。在 Linux 和 Android 系统中,采样频率为 。您可以通过 TRACY_SAMPLING_HZ 宏提供采样频率(单位:Hz)来更改该值。
Call stack sampling may be disabled by using the TRACY_NO_SAMPLING define.
可以使用 TRACY_NO_SAMPLING 定义禁用调用堆栈采样。

Linux sampling rate limits
Linux 采样率限制

The operating system may decide that sampling takes too much CPU time and reduce the allowed sampling rate. This can be seen in dmesg output as:
操作系统可能会认为采样耗费太多 CPU 时间,从而降低允许的采样率。这可以在 dmesg 输出中看到:
perf: interrupt took too long, lowering kernel.perf_event_max_sample_rate to value. If the value goes below the sample rate Tracy wants to use, sampling will be silently disabled. To make it work again, you can set an appropriate value in the kernel.perf_event_max_sample_rate kernel parameter, using the sysctl utility.
perf:中断时间过长,降低 kernel.perf_event_max_sample_rate 至值。如果该值低于 Tracy 希望使用的采样率,采样将被静默禁用。要使其恢复工作,可以使用 sysctl 工具在 kernel.perf_event_max_sample_rate 内核参数中设置一个合适的值。
Should you want to disable this mechanism, you can set the kernel.perf_cpu_time_max_percent parameter to zero. Be sure to read what this would do, as it may have serious consequences that you should be aware of.
如果想禁用这一机制,可以将 kernel.perf_cpu_time_max_percent 参数设置为零。请务必阅读这样做的后果,因为它可能会带来严重的后果。

3.15.5.1 Wait stacks 3.15.5.1 等待堆栈

The sampling functionality also captures call stacks for context switch events. Such call stacks will show you what the application was doing when the thread was suspended and subsequently resumed, hence the name. We can categorize wait stacks into the following categories:
采样功能还能捕获上下文切换事件的调用堆栈。此类调用堆栈会向你展示当线程暂停并随后恢复时,应用程序正在做什么,因此得名等待堆栈。我们可以将等待栈分为以下几类:
  1. Random preemptive multitasking events, which are expected and do not have any significance.
    随机抢占式多任务事件,这是意料之中的,没有任何意义。
  2. Expected waits, which may be caused by issuing sleep commands, waiting for a lock to become available, performing I/O, and so on. Quantitative analysis of such events may (but probably won't) direct you to some problems in your code.
    预期等待,可能是由于发布睡眠命令、等待锁可用、执行 I/O 等引起的。对此类事件的定量分析可能会(但可能不会)让你发现代码中的某些问题。
  3. Unexpected waits, which should be immediately taken care of. After all, what's the point of profiling and optimizing your program if it is constantly waiting for something? An example of such an unexpected wait may be some anti-virus service interfering with each of your file read operations. In this case, you could have assumed that the system would buffer a large chunk of the data after the first read to make it immediately available to the application in the following calls.
    意外等待,应立即处理。毕竟,如果你的程序一直在等待什么,那么对它进行剖析和优化又有什么意义呢?反病毒服务干扰了每次文件读取操作,就是意外等待的一个例子。在这种情况下,你本可以假设系统会在第一次读取后缓冲一大块数据,以便在接下来的调用中立即提供给应用程序。

Platform differences 平台差异

Wait stacks capture happen at a different time on the supported operating systems due to differences in the implementation details. For example, on Windows, the stack capture will occur when the program execution is resumed. However, on Linux, the capture will happen when the scheduler decides to preempt execution.
在支持的操作系统上,由于实现细节的不同,等待栈捕获发生的时间也不同。例如,在 Windows 系统上,当程序恢复执行时,堆栈捕获就会发生。但在 Linux 上,当调度程序决定抢占执行时,就会发生栈捕获。

3.15.6 Hardware sampling
3.15.6 硬件取样

While the call stack sampling is a generic software-implemented functionality of the operating system, there's another way of sampling program execution patterns. Modern processors host a wide array of different hardware performance counters, which increase when some event in a CPU core happens. These could be as simple as counting each clock cycle or as implementation-specific as counting 'retired instructions that are delivered to the back-end after the front-end had at least 1 bubble-slot for a period of 2 cycles'.
虽然调用堆栈采样是操作系统的一种通用软件实现功能,但还有另一种对程序执行模式进行采样的方法。现代处理器拥有大量不同的硬件性能计数器,当 CPU 内核发生某些事件时,这些计数器就会增加。这些计数器可以简单到对每个时钟周期进行计数,也可以具体到对 "前端在 2 个周期内至少有 1 个冒泡槽后交付给后端的退役指令 "进行计数。
Tracy can use these counters to present you the following three statistics, which may help guide you in discovering why your code is not as fast as possible:
Tracy 可以使用这些计数器为您提供以下三种统计数据,它们可以帮助您找出代码速度不尽人意的原因:
  1. Instructions Per Cycle (IPC) - shows how many instructions were executing concurrently within a single core cycle. Higher values are better. The maximum achievable value depends on the design of the CPU, including things such as the number of execution units and their individual capabilities. Calculated as . You can disable it with the TRACY_NO_SAMPLE_RETIREMENT macro.
    每周期指令数 (IPC) - 显示单个内核周期内并发执行的指令数。数值越大越好。可达到的最大值取决于 CPU 的设计,包括执行单元的数量及其各自的能力。计算公式为 。您可以使用 TRACY_NO_SAMPLE_RETIREMENT 宏禁用它。
  2. Branch miss rate - shows how frequently the CPU branch predictor makes a wrong choice. Lower values are better. Calculated as . You can disable it with the TRACY_NO_SAMPLE_BRANCH macro.
    分支未命中率 - 显示 CPU 分支预测器做出错误选择的频率。数值越小越好。计算公式为 。可以使用 TRACY_NO_SAMPLE_BRANCH 宏禁用它。
  3. Cache miss rate - shows how frequently the CPU has to retrieve data from memory. Lower values are better. The specifics of which cache level is taken into account here vary from one implementation to another. Calculated as . You can disable it with the TRACY_NO_SAMPLE_CACHE macro.
    缓存丢失率 - 显示 CPU 从内存中检索数据的频率。数值越小越好。不同的实现所考虑的缓存级别也不尽相同。计算公式为 。您可以使用 TRACY_NO_SAMPLE_CACHE 宏禁用它。
Each performance counter has to be collected by a dedicated Performance Monitoring Unit (PMU). However, the availability of PMUs is very limited, so you may not be able to capture all the statistics mentioned above at the same time (as each requires capture of two different counters). In such a case, you will need to manually select what needs to be sampled with the macros specified above.
每个性能计数器都必须由专用的性能监控单元 (PMU) 采集。然而,PMU 的可用性非常有限,因此您可能无法同时采集上述所有统计数据(因为每个数据都需要采集两个不同的计数器)。在这种情况下,您需要使用上述宏手动选择需要采样的内容。
If the provided measurements are not specific enough for your needs, you will need to use a profiler better tailored to the hardware you are using, such as Intel VTune, or AMD Prof.
如果提供的测量结果不够具体,您需要使用更适合您所使用硬件的剖析器,如 Intel VTune 或 AMD Prof。
Another problem to consider here is the measurement skid. It is pretty hard to accurately pinpoint the exact assembly instruction which has caused the counter to trigger. Due to this, the results you'll get may look a bit nonsense at times. For example, a branch miss may be attributed to the multiply instruction. Unfortunately, not much can be done with that, as this is exactly what the hardware is reporting. The amount of skid you will encounter depends on the specific implementation of a processor, and each vendor has its own solution to minimize it. Intel uses Precise Event Based Sampling (PEBS), which is rather good, but it still can, for example, blend the branch statistics across the comparison instruction and the following jump instruction. AMD employs its own Instruction Based Sampling (IBS), which tends to provide worse results in comparison.
另一个需要考虑的问题是测量滑橇。要准确定位导致计数器触发的确切装配指令非常困难。因此,有时你得到的结果可能看起来有点乱。例如,分支未命中可能归因于乘法指令。不幸的是,我们对此无能为力,因为这正是硬件所报告的。你会遇到多少滑动取决于处理器的具体实现,每个供应商都有自己的解决方案来尽量减少滑动。英特尔使用基于事件的精确采样 (PEBS),这是一种相当不错的方法,但它仍有可能混合比较指令和后续跳转指令的分支统计数据。AMD 采用了自己的基于指令的采样 (IBS),相比之下,结果往往更差。
Do note that the statistics presented by Tracy are a combination of two randomly sampled counters, so you should take them with a grain of salt. The random nature of sampling makes it entirely possible to count more branch misses than branch instructions or some other similar silliness. You should always cross-check this data with the count of sampled events to decide if you can reliably act upon the provided values.
请注意,Tracy 提供的统计数据是两个随机抽样计数器的组合,因此您应该谨慎对待。 取样的随机性使得分支未命中计数完全有可能多于分支指令计数,或出现其他类似的错误。您应始终将此数据与采样事件计数进行交叉检查,以确定是否可以根据所提供的值可靠地执行操作。
Availability Currently, the hardware performance counter readings are only available on Linux, which also includes the WSL2 layer on Windows . Access to them is performed using the kernel-provided infrastructure, so what you get may depend on how your kernel was configured. This also means that the exact set of supported hardware is not known, as it depends on what has been implemented in Linux itself. At this point, the x86 hardware is fully supported (including features such as PEBS or IBS), and there's PMU support on a selection of ARM designs. The performance counter data can be captured with no need for privilege elevation.
可用性 目前,硬件性能计数器读数仅在 Linux 上可用,其中还包括 Windows 上的 WSL2 层。对它们的访问是通过内核提供的基础架构进行的,因此您所获得的信息可能取决于您的内核是如何配置的。这也意味着,受支持硬件的确切集合还不得而知,因为这取决于 Linux 本身的实现情况。目前,x86 硬件已完全支持(包括 PEBS 或 IBS 等功能),部分 ARM 设计也支持 PMU。无需提升权限即可捕获性能计数器数据。

3.15.7 Executable code retrieval
3.15.7 可执行代码检索

Tracy will capture small chunks of the executable image during profiling to enable deep insight into program execution. The retrieved code can be subsequently disassembled to be inspected in detail. The profiler will perform this functionality only for functions no larger than 128 KB and only if symbol information is present.
在剖析过程中,Tracy 会捕获小块的可执行图像,以便深入了解程序的执行情况。获取的代码随后可以反汇编,以便进行详细检查。剖析器仅对不大于 128 KB 的函数执行此功能,且必须存在符号信息。
The discovery of previously unseen executable code may result in reduced performance of real-time capture. This is especially true when the profiling session had just started. However, such behavior is expected and will go back to normal after several moments.
发现以前未见的可执行代码可能会降低实时捕获的性能。当剖析会话刚刚开始时尤其如此。不过,这种情况在意料之中,过一会儿就会恢复正常。
It would be best to be extra careful when working with non-public code, as parts of your program will be embedded in the captured trace. You can disable the collection of program code by compiling the profiled application with the TRACY_NO_CODE_TRANSFER define. You can also strip the code from a saved trace using the update utility (section 4.5.4)
在处理非公开代码时最好格外小心,因为程序的部分内容会嵌入到捕获的跟踪中。您可以通过使用 TRACY_NO_CODE_TRANSFER 定义编译剖析应用程序来禁止收集程序代码。您也可以使用更新实用程序(第 4.5.4 节)从保存的跟踪中删除代码

Important 重要

For proper program code retrieval, you can unload no module used by the application during the runtime. See section 3.1.1 for an explanation.
为了正确检索程序代码,可以在运行期间卸载应用程序使用的任何模块。有关解释,请参见第 3.1.1 节。
On Linux, Tracy will override the dlclose function call to prevent shared objects from being unloaded. Note that in a well-behaved program this shouldn't have any effect, as calling dlclose does not guarantee that the shared object will be unloaded.
在 Linux 上,Tracy 将覆盖 dlclose 函数调用,以防止共享对象被卸载。请注意,在运行良好的程序中,这不会产生任何影响,因为调用 dlclose 并不能保证共享对象会被卸载。

3.15.8 Vertical synchronization
3.15.8 垂直同步

On Windows and Linux, Tracy will automatically capture hardware Vsync events, provided that the application has access to the kernel data (privilege elevation may be needed, see section 3.15.1). These events will be reported as ' Vsync' frame sets, where x is the identifier of a specific monitor. Note that hardware vertical synchronization might not correspond to the one seen by your application due to desktop composition, command queue buffering, and so on. Also, in some instances, when there is nothing to update on the screen, the graphic driver may choose to stop issuing screen refresh. As a result, there may be periods where no vertical synchronization events are reported.
在 Windows 和 Linux 系统中,只要应用程序能够访问内核数据(可能需要提升权限,参见第 3.15.1 节),Tracy 就会自动捕获硬件 Vsync 事件。这些事件将以" Vsync "帧集的形式报告,其中 x 是特定显示器的标识符。请注意,由于桌面组成、命令队列缓冲等原因,硬件垂直同步可能与应用程序看到的不一致。此外,在某些情况下,当屏幕上没有任何内容需要更新时,图形驱动程序可能会选择停止发布屏幕刷新。因此,可能会有一段时间没有报告垂直同步事件。
Use the TRACY_NO_VSYNC_CAPTURE macro to disable capture of Vsync events.
使用 TRACY_NO_VSYNC_CAPTURE 宏禁用 Vsync 事件捕获。

3.16 Trace parameters 3.16 跟踪参数

Sometimes it is desired to change how the profiled application behaves during the profiling run. For example, you may want to enable or disable the capture of frame images without recompiling and restarting your program. To be able to do so you must register a callback function using the TracyParameterRegister (callback, data) macro, where callback is a function conforming to the following signature:
有时需要更改剖析程序在剖析运行期间的行为方式。例如,您可能希望启用或禁用帧图像捕获,而无需重新编译和重启程序。为此,您必须使用 TracyParameterRegister (callback, data) 宏注册一个回调函数,其中 callback 是一个符合以下签名的函数:
void Callback(void* data, uint32_t idx, int32_t val)
The data parameter will have the same value as was specified in the macro. The idx argument is an user-defined parameter index and val is the value set in the profiler user interface.
数据参数的值与宏中指定的值相同。idx 参数是用户定义的参数索引,val 是在剖析器用户界面中设置的值。
To specify individual parameters, use the TracyParameterSetup (idx, name, isBool, val) macro. The idx value will be passed to the callback function for identification purposes (Tracy doesn't care what it's set to). Name is the parameter label, displayed on the list of parameters. Finally, isBool determines if val should be interpreted as a boolean value, or as an integer number.
要指定单个参数,请使用 TracyParameterSetup(idx、name、isBool、val)宏。idx 值将传递给回调函数,用于识别目的(Tracy 并不关心它的设置)。name 是参数标签,显示在参数列表中。最后,isBool 决定 val 应被解释为布尔值还是整数。

Important 重要

Usage of trace parameters makes profiling runs dependent on user interaction with the profiler, and thus it's not recommended to be employed if a consistent profiling environment is desired. Furthermore, interaction with the parameters is only possible in the graphical profiling application but not in the command line capture utility.
使用跟踪参数会使剖析运行依赖于用户与剖析器的交互,因此,如果希望获得一致的剖析环境,不建议使用跟踪参数。此外,与参数的交互只能在图形剖析应用程序中进行,而不能在命令行捕获工具中进行。

3.17 Source contents callback
3.17 源内容回调

Tracy performs several data discovery attempts to show you the source file contents associated with the executed program, which is explained in more detail in chapter 5.16. However, sometimes the source files cannot be accessed without your help. For example, you may want to profile a script that is loaded by the game and which only resides in an archive accessible only by your program. Accordingly, Tracy allows inserting your own custom step at the end of the source discovery chain, with the TracySourceCallbackRegister (callback, data) macro, where callback is a function conforming to the following signature:
Tracy 会进行多次数据发现尝试,以向您显示与已执行程序相关的源文件内容,第 5.16 章对此有更详细的解释。不过,有时没有您的帮助也无法访问源文件。例如,您可能想对游戏加载的脚本进行剖析,而该脚本只存在于一个只有您的程序才能访问的归档文件中。因此,Tracy 允许使用 TracySourceCallbackRegister (callback, data) 宏,在源代码发现链的末尾插入您自己的自定义步骤:
char* Callback(void* data, const char* filename, size_t& size)
The data parameter will have the same value as was specified in the macro. The filename parameter contains the file name of the queried source file. Finally, the size parameter is used only as an out-value and does not contain any functional data.
数据参数的值与宏中指定的值相同。filename 参数包含所查询源文件的文件名。最后,size 参数仅用作输出值,不包含任何功能数据。
The return value must be nullptr if the input file name is not accessible to the client application. If the file can be accessed, then the data size must be stored in the size parameter, and the file contents must be returned in a buffer allocated with the tracy::tracy_malloc_fast (size) function. Buffer contents do not need to be null-terminated. If for some reason the already allocated buffer can no longer be used, it must be freed with the tracy::tracy_free_fast(ptr) function.
如果客户端应用程序无法访问输入文件名,则返回值必须为 nullptr。如果文件可以访问,则数据大小必须存储在 size 参数中,文件内容必须通过 tracy::tracy_malloc_fast (size) 函数分配的缓冲区返回。缓冲区的内容不必以空值结束。如果由于某种原因无法继续使用已分配的缓冲区,则必须使用 tracy::tracy_free_fast(ptr) 函数释放该缓冲区。
Transfer of source files larger than some unspecified, but reasonably large threshold won't be performed.
不会传输大于某个未指定但合理的 阈值的源文件。

3.18 Connection status 3.18 连接状态

To determine if a connection is currently established between the client and the server, you may use the TracyIsConnected macro, which returns a boolean value.
要确定客户端与服务器之间当前是否已建立连接,可使用 TracyIsConnected 宏,该宏返回一个布尔值。

4 Capturing the data
4 采集数据

After the client application has been instrumented, you will want to connect to it using a server, available either as a headless capture-only utility or as a full-fledged graphical profiling interface.
在对客户端应用程序进行仪器分析后,您需要使用服务器与之连接,该服务器既可以是无头捕获工具,也可以是完整的图形剖析界面。

4.1 Command line 4.1 命令行

You can capture a trace using a command line utility contained in the capture directory. To use it you may provide the following parameters: - -o output.tracy - the file name of the resulting trace (required).
您可以使用捕获目录中的命令行实用程序捕获跟踪。要使用该工具,您可以提供以下参数: - -o output.tracy - 追踪结果的文件名(必填)。

- -a address - specifies the IP address (or a domain name) of the client application (uses localhost if not provided).
-a 地址 - 指定客户端应用程序的 IP 地址(或域名)(如果未提供,则使用 localhost)。

- -p port - network port which should be used (optional).
- p port - 应使用的网络端口(可选)。

- -f - force overwrite, if output file already exists.
-f - 如果输出文件已经存在,则强制覆盖。

- -s seconds - number of seconds to capture before automatically disconnecting (optional).
-s seconds - 自动断开连接前的捕获秒数(可选)。

- -m memlimit - sets memory limit for the trace. The connection will be terminated, if it is exceeded. Specified as a percentage of total system memory. Can be greater than , which will use swap. Disabled, if not set.
-m memlimit - 设置跟踪的内存限制。如果超过上限,连接将被终止。以系统内存总量的百分比指定。可大于 ,否则将使用 swap。如果未设置,则禁用。
If no client is running at the given address, the server will wait until it can make a connection. During the capture, the utility will display the following information:
如果给定地址上没有正在运行的客户端,服务器将等待,直到可以建立连接。在捕获过程中,实用程序将显示以下信息:
% ./capture -a 127.0.0.1 -o trace
Connecting to 127.0.0.1:8086...
Queue delay: 5 ns
Timer resolution: 3 ns
    1.33 Mbps / 40.4%=3.29 Mbps | Net: 64.42 MB | Mem: 283.03 MB | Time: 10.6 s
The queue delay and timer resolution parameters are calibration results of timers used by the client. The following line is a status bar, which displays: network connection speed, connection compression ratio, and the resulting uncompressed data rate; the total amount of data transferred over the network; memory usage of the capture utility; time extent of the captured data.
队列延迟和计时器分辨率参数是客户端使用的计时器的校准结果。下一行是状态栏,显示:网络连接速度、连接压缩率和由此产生的未压缩数据速率;通过网络传输的数据总量;捕获实用程序的内存使用量;捕获数据的时间范围。
You can disconnect from the client and save the captured trace by pressing Ctrl + C. If you prefer to disconnect after a fixed time, use the -s seconds parameter.
如果希望在固定时间后断开连接,请使用 -s 秒参数。

4.2 Interactive profiling
4.2 交互式剖析

If you want to look at the profile data in real-time (or load a saved trace file), you can use the data analysis utility contained in the profiler directory. After starting the application, you will be greeted with a welcome dialog (figure 9), presenting a bunch of useful links ( User manual, Web, Join chat and Sponsor). The Web button opens a drop-down list with links to the profiler's Home page and a bunch of Feature videos.
如果想实时查看剖面数据(或加载已保存的跟踪文件),可以使用剖面器目录中的数据分析实用程序。启动程序后,会出现一个欢迎对话框(图 9),其中提供了许多有用的链接( 用户手册、网络、加入聊天和赞助商)。Web 按钮会打开一个下拉列表,其中包含剖析器 主页和大量 功能视频的链接。
The Wrench button opens the about dialog, which also contains a number of global settings you may want to tweak.
扳手 "按钮会打开 "关于 "对话框,其中还包含一些您可能需要调整的全局设置。

use the connection history button to display a list of commonly used targets, from which you can quickly select an address. You can remove entries from this list by hovering the mouse cursor over an entry and pressing the Del. button on the keyboard.
使用连接历史记录按钮 显示常用目标列表,您可以从中快速选择一个地址。将鼠标光标悬停在条目上并按下键盘上的 Del. 按钮,即可从列表中删除条目。
If you want to open a trace that you have stored on the disk, you can do so by pressing the Open saved trace button.
如果要打开存储在磁盘上的轨迹,可以按下 打开保存的轨迹按钮。
The discovered clients list is only displayed if clients are broadcasting their presence on the local network Each entry shows the client's address (and port, if different from the default one), how long the client has been running, and the name of the profiled application. Clicking on an entry will connect to the client. Incompatible clients are grayed out and can't be connected to, but Tracy will suggest a compatible version, if able. Clicking on the Filter toggle button will display client filtering input fields, allowing removal of the displayed entries according to their address, port number, or program name. If filters are active, a yellow A warning icon will be displayed.
只有当客户端在本地网络 上广播自己的存在时,才会显示已发现客户端列表。每个条目都会显示客户端的地址 (和端口,如果与默认端口不同)、客户端运行的时间以及已配置应用程序的名称。点击条目将连接到客户端。不兼容的客户端会显示为灰色,无法连接,但 Tracy 会推荐一个兼容的版本(如果可以的话)。点击 过滤切换按钮将显示客户端过滤输入字段,允许根据地址、端口号或程序名称删除显示的条目。如果过滤处于激活状态,则会显示黄色 A 警告图标。
Figure 9: Welcome dialog.
图 9:欢迎对话框。
Both connecting to a client and opening a saved trace will present you with the main profiler view, which you can use to analyze the data (see section 5).
连接客户端和打开已保存的跟踪都会显示剖析器主视图,您可以用它来分析数据(见第 5 节)。
Once connected to a client can be used to quickly discard any captured data and reconnect to a client at the same address.
连接到客户端后, 可用于快速丢弃捕获的数据,并以相同地址重新连接到客户端。

4.2.1 Connection information pop-up
4.2.1 连接信息弹出窗口

If this is a real-time capture, you will also have access to the connection information pop-up (figure 10) through the Connection button, with the capture status similar to the one displayed by the command-line utility. This dialog also shows the connection speed graphed over time and the profiled application's current frames per second and frame time measurements. The Query backlog consists of two numbers. The first represents the number of queries that were held back due to the bandwidth volume overwhelming the available network send buffer. The second one shows how many queries are in-flight, meaning requests sent to the client but not yet answered. While these numbers drain down to zero, the performance of real time profiling may be temporarily compromised. The circle displayed next to the bandwidth graph signals the connection status. If it's red, the connection is active. If it's gray, the client has disconnected.
如果是实时捕获,您还可以通过 连接按钮访问弹出式连接信息(图 10),捕获状态与命令行实用程序显示的状态类似。该对话框还显示随时间变化的连接速度曲线图,以及被剖析应用程序的当前每秒帧数和帧时间测量值。查询积压由两个数字组成。第一个数字表示由于带宽量超过可用网络发送缓冲区而被搁置的查询次数。第二个数字显示有多少查询正在处理中,即已发送到客户端但尚未回复的请求。当这些数字耗尽为零时,实时剖析的性能可能会暂时受到影响。带宽图旁边显示的圆圈表示连接状态。如果是红色,则表示连接处于活动状态。如果是灰色,则表示客户端已断开连接。
You can use the Save trace button to save the current profile data to a file . The available compression modes are discussed in sections 4.5.1 and 4.5.3. Use the Stop button to disconnect from the client . The A Discard button is used to discard current trace.
您可以使用保存跟踪按钮将当前配置文件数据保存到文件 中。可用的压缩模式将在第 4.5.1 和 4.5.3 节中讨论。使用 停止按钮断开与客户端 的连接。A 丢弃按钮用于丢弃当前跟踪。
Figure 10: Connection information pop-up.
图 10:弹出连接信息。
If frame image capture has been implemented (chapter 3.3.3), a thumbnail of the last received frame image will be provided for reference.
如果已执行帧图像捕捉(3.3.3 章),则将提供最后接收帧图像的缩略图以供参考。
Suppose the profiled application opted to provide trace parameters (see section 3.16) and the connection is still active. In that case, this pop-up will also contain a trace parameters section, listing all the provided options. A callback function will be executed on the client when you change any value here.
假设被剖析应用程序选择提供跟踪参数(见第 3.16 节),且连接仍处于活动状态。在这种情况下,弹出窗口也将包含跟踪参数部分,列出所有提供的选项。当您更改此处的任何值时,客户端将执行一个回调函数。

4.2.2 Automatic loading or connecting
4.2.2 自动装载或连接

You can pass the trace file name as an argument to the profiler application to open the capture, skipping the welcome dialog. You can also use the -a address argument to connect to the given address automatically. Finally, to specify the network port, pass the -p port parameter. The profiler will use it for client connections (overridable in the UI) and for listening to client discovery broadcasts.
您可以将跟踪文件名作为参数传递给剖析器应用程序,以跳过欢迎对话框打开捕获。还可以使用 -a 地址参数自动连接到给定的地址。最后,通过 -p 端口参数指定网络端口。剖析器将使用该端口进行客户端连接(可在用户界面中重写)和监听客户端发现广播。

4.3 Connection speed 4.3 连接速度

Tracy network bandwidth requirements depend on the amount of data collection the profiled application performs. You may expect anything between 1 Mbps and 100 Mb s data transfer rate in typical use case scenarios.
特雷西网络带宽要求取决于剖析应用程序执行的数据收集量。在典型的使用情况下,数据传输速率可能在 1 Mbps 到 100 Mb/s 之间。
The maximum attainable connection speed is determined by the ability of the client to provide data and the ability of the server to process the received data. In an extreme conditions test performed on an i7 8700K, the maximum transfer rate peaked at 950 Mbps . In each second, the profiler could process 27 million zones and consume 1 GB of RAM.
最大连接速度取决于客户端提供数据的能力和服务器处理接收数据的能力。在使用 i7 8700K 进行的极端条件测试中,最高传输速率达到 950 Mbps。在每秒内,剖析器可处理 2,700 万个区,消耗 1 GB 内存。

4.4 Memory usage 4.4 内存使用情况

The captured data is stored in RAM and only written to the disk when the capture finishes. This can result in memory exhaustion when you capture massive amounts of profile data or even in typical usage situations when the capture is performed over a long time. Therefore, the recommended usage pattern is to perform moderate instrumentation of the client code and limit capture time to the strict necessity.
捕获的数据存储在 RAM 中,只有在捕获结束时才写入磁盘。在捕获大量配置文件数据时,甚至在长时间捕获的典型使用情况下,这可能会导致内存耗尽。因此,推荐的使用模式是对客户端代码执行适度的检测,并将捕获时间严格限制在必要的范围内。
In some cases, it may be helpful to perform an on-demand capture, as described in section 2.1.5. In such a case, you will be able to profile only the exciting topic (e.g., behavior during loading of a level in a game), ignoring all the unneeded data.
在某些情况下,执行按需捕捉可能会有所帮助,如第 2.1.5 节所述。在这种情况下,您可以只对令人兴奋的主题(如游戏中加载关卡时的行为)进行剖析,而忽略所有不需要的数据。
If you genuinely need to capture large traces, you have two options. Either buy more RAM or use a large swap file on a fast disk drive .
如果您确实需要捕获大量痕迹,您有两个选择。要么购买更多内存,要么在快速磁盘驱动器 上使用大型交换文件。

4.5 Trace versioning 4.5 追踪版本

Each new release of Tracy changes the internal format of trace files. While there is a backward compatibility layer, allowing loading traces created by previous versions of Tracy in new releases, it won't be there forever. You are thus advised to upgrade your traces using the utility contained in the update directory.
Tracy 的每个新版本都会更改跟踪文件的内部格式。虽然有一个向后兼容层,允许在新版本中加载以前版本 Tracy 创建的跟踪文件,但它不会永远存在。因此,建议您使用更新目录中的实用程序升级跟踪文件。
To use it, you will need to provide the input file and the output file. The program will print a short summary when it finishes, with information about trace file versions, their respective sizes and the output trace file compression ratio:
使用时,您需要提供输入文件和输出文件。程序完成后会打印一份简短的摘要,其中包含有关跟踪文件版本、各自大小和输出跟踪文件压缩率的信息:
./update old.tracy new.tracy
old.tracy (0.3.0) {916.4 MB} new.tracy (0.4.0) {349.4 MB, } change
old.tracy (0.3.0) {916.4 MB} new.tracy (0.4.0) {349.4 MB, } 更改
The new file contains the same data as the old one but with an updated internal representation. Note that the whole trace needs to be loaded to memory to perform an upgrade.
新文件包含与旧文件相同的数据,但更新了内部表示。请注意,升级时需要将整个跟踪加载到内存中。

4.5.1 Archival mode 4.5.1 存档模式

The update utility supports optional higher levels of data compression, which reduce disk size of traces at the cost of increased compression times. The output files have a reasonable size and are quick to save and load with the default settings. A list of available compression modes and their respective results is available in table 7 and figures 11,12 and 13. The following command-line options control compression mode selection:
更新实用程序支持可选的更高级别的数据压缩,以增加压缩时间为代价减少磁盘痕迹的大小。在默认设置下,输出文件大小适中,保存和加载速度很快。表 7 和图 11、12、13 列出了可用的压缩模式及其各自的结果。以下命令行选项可控制压缩模式的选择:
Mode 模式 Size 尺寸 Ratio 比率 Save time 节省时间 Load time 加载时间
lz4 162.48 MB 1.91 s 470 ms 470 毫秒
lz4 hc 77.33 MB 39.24 s 401 ms 401 毫秒
lz extreme 72.67 MB 406 ms 406 毫秒
zstd 1 63.17 MB 2.27 s 868 ms 868 毫秒
zstd 2 63.29 MB 2.31 s 884 ms 884 毫秒
zstd 3 62.94 MB 2.43 s 867 ms 867 毫秒
zstd 4 62.81 MB 2.44 s 855 ms 855 毫秒
zstd 5 61.04 MB 3.98 s 855 ms 855 毫秒
zstd 6 60.27 MB 4.19 s 827 ms 827 毫秒
zstd 7 61.53 MB 6.6 s 761 ms 761 毫秒
zstd 8 60.44 MB 7.84 s 746 ms 746 毫秒
zstd 9 59.58 MB 9.6 s 724 ms 724 毫秒
zstd 10 59.36 MB 10.29 s 706 ms 706 毫秒
zstd 11 59.2 MB 11.23 s 717 ms 717 毫秒
zstd 12 58.51 MB 15.43 s 695 ms 695 毫秒
zstd 13 56.16 MB 35.55 s 642 ms 642 毫秒
zstd 14 55.76 MB 37.74 s 627 ms 627 毫秒
zstd 15 54.65 MB 600 ms 600 毫秒
zstd 16 50.94 MB 537 ms 537 毫秒
zstd 17 50.18 MB 542 ms 542 毫秒
zstd 18 49.91 MB 554 ms 554 毫秒
zstd 19 46.99 MB 605 ms 605 毫秒
zstd 20 46.81 MB 608 ms 608 毫秒
zstd 21 45.77 MB 614 ms 614 毫秒
zstd 22 45.52 MB 621 ms 621 毫秒
Table 7: Compression results for an example trace.
表 7:示例轨迹的压缩结果。
Tests performed on Ryzen 9 3900X.
测试在 Ryzen 9 3900X 上进行。
  • -4 - selects LZ4 algorithm.
    -4 - 选择 LZ4 算法。
    • h - enables LZ4 HC compression.
      h - 启用 LZ4 HC 压缩。
  • -e - uses LZ4 extreme compression.
    -e - 使用 LZ4 极端压缩。
  • -z level - selects Zstandard algorithm, with a specified compression level.
    -z level - 选择 Zstandard 算法,并指定压缩级别。
Trace files created using the and extreme modes are optimized for fast decompression and can be further compressed using file compression utilities. For example, using 7-zip results in archives of the following sizes: 77.2 MB, 54.3 MB, 52.4 MB.
使用 极端模式创建的跟踪文件经过优化,可快速解压缩,并可使用文件压缩实用程序进一步压缩。例如,使用 7-zip 可得到以下大小的存档:77.2 MB、54.3 MB、52.4 MB。
For archival purposes, it is, however, much better to use the zstd compression modes, which are faster, compress trace files more tightly, and are directly loadable by the profiler, without the intermediate decompression step.
不过,就存档而言,使用 zstd 压缩模式要好得多,因为这种模式速度更快,对跟踪文件的压缩更紧密,而且剖析器可直接加载,无需中间的解压缩步骤。

4.5.2 Compression streams
4.5.2 压缩流

Saving and loading trace data can be parallelized using the -j streams parameter. Each compression stream runs on its own thread, and it makes little sense to use more streams than you have CPU cores. Note that the number of streams set at save time will also be used at load time, which may affect load performance if you are viewing the trace on a less powerful machine.
使用 -j streams 参数可以并行保存和加载跟踪数据。每个压缩流都在各自的线程上运行,使用比 CPU 内核更多的流并没有什么意义。请注意,保存时设置的流数量也会在加载时使用,如果在性能较弱的机器上查看跟踪数据,这可能会影响加载性能。
Going overboard with the number of streams is not recommended, especially with the fast compression modes where it will be difficult to keep each stream busy. Also, complex compression codecs (e.g. zstd at
不建议过多使用数据流的数量,特别是在快速压缩模式下,很难保证每个数据流都处于繁忙状态。此外,复杂的压缩编解码器(如 zstd at
Figure 11: Plot of trace sizes for different compression modes (see table 7).
图 11:不同压缩模式下的轨迹大小图(见表 7)。
Figure 12: Logarithmic plot of trace compression times for different compression modes (see table 7).
图 12:不同压缩模式下痕量压缩时间的对数图(见表 7)。
Figure 13: Plot of trace load times for different compression modes (see table 7).
图 13:不同压缩模式下的轨迹加载时间图(见表 7)。
Streams 溪流
Mode 模式
lz4
lz4 hc
lz4 ext
zstd 1
zstd 3
zstd 6
zstd 9
zstd 18
zstd 22
Table 8: The increase in file size for different compression modes, as compared to a single stream.
表 8:与单一数据流相比,不同压缩模式下文件大小的增加。
Streams 溪流
Mode 模式
lz4 2.04 2.52 2.11 3.24
lz4 hc 3.56 6.73 9.49 15.26
lz4 ext 3.38 6.53 9.57 17.03
zstd 1 2.24 3.68 3.40 3.37
zstd 3 3.23 4.13 4.07 4.50
zstd 6 3.52 6.00 6.53 6.95
zstd 9 3.10 4.26 5.12 5.40
zstd 18 3.22 5.41 8.49 14.51
zstd 22 3.99 7.47 11.10 18.20
Table 9: The speedup ( x times faster) in saving time for different modes of compression, as compared to a single stream.
表 9:与单一数据流相比,不同压缩模式节省时间的速度提升(x 倍)。
level 22) have significantly worse compression rates when the work is divided. This is a fairly nuanced topic, and you are encouraged to do your own measurements, but for a rough guideline on the behavior, you can refer to tables 8 and 9 .
22 级)在分工时压缩率明显较低。这是一个相当微妙的话题,我们鼓励你自己进行测量,但要获得有关行为的粗略指导,可以参考表 8 和表 9。

4.5.3 Frame images dictionary
4.5.3 帧图像词典

Frame images have to be compressed individually so that there are no delays during random access to the contents of any image. Unfortunately, because of this, there is no reuse of compression state between similar (or even identical) images, which leads to increased memory consumption. The profiler can partially remedy this by enabling the calculation of an optional frame images dictionary with the -d command line parameter.
帧图像必须单独压缩,这样在随机访问任何图像内容时都不会出现延迟。遗憾的是,由于这种原因,类似(甚至相同)的图像之间无法重复使用压缩状态,从而导致内存消耗增加。通过使用 -d 命令行参数启用可选帧图像字典的计算,剖析器可以部分解决这个问题。
Saving a trace with frame images dictionary-enabled will need some extra time, depending on the amount of image data you have captured. Loading such a trace will also be slower, but not by much. How much RAM the dictionary will save depends on the similarity of frame images. Be aware that post-processing effects such as artificial film grain have a subtle impact on image contents, which is significant in this case.
保存已启用帧图像字典的轨迹需要额外的时间,这取决于所捕获的图像数据量。加载这样的轨迹也会慢一些,但不会慢很多。词典能节省多少内存取决于帧图像的相似度。需要注意的是,人工胶片颗粒等后期处理效果会对图像内容产生微妙的影响,这一点在本例中非常明显。
The dictionary cannot be used when you are capturing a trace.
捕捉跟踪时不能使用字典。

4.5.4 Data removal 4.5.4 数据删除

In some cases you may want to share just a portion of the trace file, omitting sensitive data such as source file cache, or machine code of the symbols. This can be achieved using the -s flags command line option. To select what kind of data is to be stripped, you need to provide a list of flags selected from the following:
在某些情况下,你可能只想共享部分跟踪文件,而忽略敏感数据,如源文件缓存或符号的机器代码。这可以使用 -s flags 命令行选项来实现。要选择剥离哪类数据,需要提供从以下选项中选择的标志列表:
  • 1 - locks. 1 - 锁。
  • m - messages. m - 信息。
  • p - plots. p - 绘图。
  • M-memory.
  • i - frame images.
    i - 帧图像。
  • c-context switches. c-context 开关。
  • s - sampling data.
    s - 采样数据。
  • C-symbol code. C 符号代码
  • - source file cache.
    - 源文件缓存。
Flags can be concatenated. For example specifying -s CSi will remove symbol code, source file cache, and frame images in the destination trace file.
标记可以连接。例如,指定 -s CSi 将删除目标跟踪文件中的符号代码、源文件缓存和帧图像。

4.6 Source file cache scan
4.6 源文件缓存扫描

Sometimes access to source files may not be possible during the capture. This may be due to capturing the trace on a machine without the source files on disk, use of paths relative to the build directory, clash of file location schemas (e.g., on Windows, you can have native paths, like C:\directory file and WSL paths, like /mnt/c/directory/file, pointing to the same file), and so on.
有时,在捕获过程中可能无法访问源文件。这可能是由于在磁盘上没有源文件的机器上捕获跟踪、使用相对于构建目录的路径、文件位置模式冲突(例如,在 Windows 上,可以有本地路径(如 C:\directory 文件)和 WSL 路径(如 /mnt/c/directory/file,指向同一个文件),等等。
You may force a recheck of the source file availability during the update process with the -c command line parameter. All the source files missing from the cache will be then scanned again and added to the cache if they do pass the validity checks (see section 5.16 )
在更新过程中,可以使用 -c 命令行参数强制重新检查源文件的可用性。缓存中丢失的所有源文件都会被重新扫描,如果它们通过了有效性检查,就会被添加到缓存中(参见第 5.16 节)。

4.7 Instrumentation failures
4.7 仪器故障

In some cases, your program may be incorrectly instrumented. For example, you could have unbalanced zone begin and end events or report a memory-free event without first reporting a memory allocation event. When Tracy detects such misbehavior, it immediately terminates the connection with the client and displays an error message.
在某些情况下,您的程序可能使用了不正确的仪器。例如,你可能有不平衡的区域开始和结束事件,或者在报告无内存事件时没有先报告内存分配事件。当 Tracy 检测到此类错误行为时,它会立即终止与客户端的连接并显示错误信息。

5 Analyzing captured data
5 分析捕获的数据

You have instrumented your application, and you have captured a profiling trace. Now you want to look at the collected data. You can do this in the application contained in the profiler directory.
您已经检测了应用程序,并捕获了剖析跟踪。现在,您想查看收集到的数据。您可以在 profiler 目录中的应用程序中进行查看。
The workflow is identical, whether you are viewing a previously saved trace or if you're performing a live capture, as described in section 4.2 .
无论是查看先前保存的轨迹,还是执行实时捕捉(如第 4.2 节所述),工作流程都是相同的。

5.1 Time display 5.1 时间显示

In most cases Tracy will display an approximation of time value, depending on how big it is. For example, a short time range will be displayed as 123 ns , and some longer ones will be shortened to , , or even to indicate more than a day has passed.
在大多数情况下,Tracy 会显示时间值的近似值,具体取决于时间值的大小。例如,较短的时间范围将显示为 123 ns ,而一些较长的时间范围将缩短为 ,甚至 ,以表示已过去一天多的时间。
While such a presentation makes time values easy to read, it is not always appropriate. For example, you may have multiple events happen at a time approximated to 1:23.4, giving you the precision of only of a second. And there's certainly a lot that can happen in 100 ms .
虽然这种表述方式使时间值易于读取,但并不总是合适的。例如,可能有多个事件发生的时间近似为 1:23.4,这样就只能精确到 秒。而 100 毫秒内肯定会发生很多事情。
An alternative time display is used in appropriate places to solve this problem. It combines a day-hourminute-second value with full nanosecond resolution, resulting in values such as 1:23 456,789,012 ns.
为了解决这个问题,在适当的地方使用了另一种时间显示方式。它将日-小时-分钟-秒的数值与全纳秒分辨率相结合,从而显示出 1:23 456,789,012 ns 这样的数值。
Figure 14: Main profiler window. Note that this manual has split the top line of buttons into two rows.
图 14:剖析器主窗口。请注意,本手册将顶部的按钮分为两行。

5.2 Main profiler window
5.2 剖析器主窗口

The main profiler window is split into three sections, as seen in figure 14: the control menu, the frame time graph, and the timeline display.
如图 14 所示,剖析器主窗口分为三个部分:控制菜单、帧时间图和时间轴显示。

5.2.1 Control menu 5.2.1 控制菜单

The control menu (top row of buttons) provides access to various profiler features. The buttons perform the following actions:
通过控制菜单(最上面一行按钮)可以使用剖析器的各种功能。这些按钮可执行以下操作:

- こ Connection - Opens the connection information popup (see section 4.2.1). Only available when live capture is in progress.
- こ 连接 - 打开连接信息弹出窗口(见第 4.2.1 节)。仅在实时捕捉正在进行时可用。

- Close - This button unloads the current profiling trace and returns to the welcome menu, where another trace can be loaded. In live captures it is replaced by Pause, Resume and Stopped buttons.
- 关闭 - 此按钮将卸载当前剖析轨迹并返回欢迎菜单,在该菜单中可以加载另一条轨迹。在实时捕获中,它被 暂停、恢复和 停止按钮所取代。

- Pause - While a live capture is in progress, the profiler will display recent events, as either the last three fully captured frames, or a certain time range. You can use this to see the current behavior of the program. The pause button will stop the automatic updates of the timeline view (the capture will still be progressing).
- 暂停 - 当实时捕获正在进行时,剖析器将显示最近的事件,如最后三个完全捕获的帧或一定的时间范围。您可以用它来查看程序的当前行为。暂停按钮 将停止时间线视图的自动更新(捕获仍在进行中)。

- Resume - This button allows to resume following the most recent events in a live capture. You will have selection of one of the following options: Newest three frames, or Use current zoom level.
- 恢复 - 此按钮允许恢复实时捕捉中最近发生的事件。您可以选择以下选项之一: 最新三帧,或使用当前缩放级别。

- Stopped - Inactive button used to indicate that the client application was terminated.
- 已停止 - 非活动按钮用于指示客户端应用程序已终止。

- Options - Toggles the settings menu (section 5.4).
- 选项 - 切换设置菜单(第 5.4 节)。

- Messages - Toggles the message log window (section 5.5), which displays custom messages sent by the client, as described in section 3.7.
- 消息 - 切换消息日志窗口(第 5.5 节),该窗口显示客户端发送的自定义消息,如第 3.7 节所述。

- Q Find zone - This buttons toggles the find zone window, which allows inspection of zone behavior statistics (section 5.7).
- Q 查找区段 - 此按钮可切换查找区段窗口,以便检查区段行为统计数据(第 5.7 节)。

- Statistics - Toggles the statistics window, which displays zones sorted by their total time cost (section 5.6).
- 统计 - 切换统计窗口,该窗口显示按总时间成本排序的区段(第 5.6 节)。

- .

between two profiling runs (section 5.8).
在两次剖析运行之间(第 5.8 节)。
  • 此 Info - Show general information about the trace (section 5.12).
    此信息 - 显示有关轨迹的一般信息(第 5.12 节)。
  • Tools - Allows access to optional data collected during capture. Some choices might be unavailable.
    工具 - 允许访问采集过程中收集的可选数据。某些选项可能不可用。
  • Playback - If frame images were captured (section 3.3.3), you will have option to open frame image playback window, described in chapter 5.19.
    回放 - 如果拍摄了帧图像(第 3.3.3 节),则可以选择打开帧图像回放窗口,详见第 5.19 章。
  • きCPU data - If context switch data was captured (section 3.15.3), this button will allow inspecting what was the processor load during the capture, as described in section 5.20.
    きCPU 数据 - 如果捕获了上下文切换数据(第 3.15.3 节),则可通过此按钮查看捕获期间的处理器负载,如第 5.20 节所述。
  • Annotations - If annotations have been made (section 5.3.1), you can open a list of all annotations, described in chapter 5.22 .
    注释 - 如果已添加注释(第 5.3.1 节),则可以打开所有注释的列表,详见第 5.22 章。
  • Limits - Displays time range limits window (section 5.3).
    限制 - 显示时间范围限制窗口(第 5.3 节)。
  • Wait stacks - If sampling was performed, an option to display wait stacks may be available. See chapter 3.15.5.1 for more details.
    等待堆栈 - 如果执行了采样,则可能有显示等待堆栈的选项。详情请参见第 3.15.5.1 章。
  • Display scale - Enables run-time resizing of the displayed content. This may be useful in environments with potentially reduced visibility, e.g. during a presentation. Note that this setting is independent to the UI scaling coming from the system DPI settings.
    显示比例--可在运行时调整显示内容的大小。在可视性可能降低的环境中,例如在演示过程中,这可能非常有用。请注意,该设置与系统 DPI 设置中的用户界面缩放无关。
The frame information block consists of four elements: the current frame set name along with the number of captured frames (click on it with the left mouse button to go to a specified frame), the two navigational buttons and , which allow you to focus the timeline view on the previous or next frame, and the frame set selection button , which is used to switch to another frame set . For more information about marking frames, see section 3.3 .
帧信息块 由四个元素组成:当前帧集名称和捕获帧的数量(单击鼠标左键可转到指定帧)、两个导航按钮 (可将时间线视图聚焦到上一帧或下一帧)以及帧集选择按钮 (用于切换到另一个帧集) 。有关标记帧的更多信息,请参见第 3.3 节。
The following three items show the (-) view time range, the time span of the whole capture (clicking on it with the middle mouse button will set the view range to the entire capture), and the memory usage of the profiler.
以下三个项目显示(-)视图时间范围、整个捕获的时间跨度(用鼠标中键单击将视图范围设置为整个捕获)以及剖析器的内存使用情况。

5.2.1.1 Notification area
5.2.1.1 通知区域

The notification area displays informational notices, for example, how long it took to load a trace from the disk. A pulsating dot next to the icon indicates that some background tasks are being performed that may need to be completed before full capabilities of the profiler are available. If a crash was captured during profiling (section 2.5), a crash icon will be displayed. The red indicates that queries are currently being backlogged, while the same yellow icon indicates that some queries are currently in-flight (see chapter 4.2.1 for more information).
通知区域显示信息通知,例如从磁盘加载跟踪所需的时间。 图标旁边的脉动点表示正在执行某些后台任务,这些任务可能需要在完成后才能使用剖析器的全部功能。如果在剖析过程中捕获到崩溃(第 2.5 节),则会显示崩溃图标。红色图标表示当前正在积压查询,黄色图标表示当前正在执行某些查询(更多信息请参阅第 4.2.1 章)。
If the drawing of timeline elements was disabled in the options menu (section 5.4), the profiler will use the following orange icons to remind you about that fact. Click on the icons to enable drawing of the selected elements. Note that collapsed labels (section 5.2.3.3) are not taken into account here.
如果在选项菜单(第 5.4 节)中禁用了绘制时间线要素,剖析器将使用以下橙色图标提醒您。点击图标可启用所选元素的绘制。请注意,此处不考虑折叠标签(第 5.2.3.3 节)。
  • - Display of empty labels is enabled.
    - 启用空标签显示。

- CPU data is hidden.
- 隐藏 CPU 数据。

- - GPU zones are hidden.
- GPU 区域被隐藏。

- - CPU zones are hidden. - Locks are hidden.
- CPU 区域被隐藏。- 锁被隐藏。

- - Plots are hidden.
- - 情节被隐藏。

- - Ghost zones are not displayed.
- 不显示幽灵区。

- At least one timeline item (e.g. a single thread, a single plot, a single lock, etc.) is hidden.
- 至少有一个时间线项目(如单线程、单情节、单锁等)被隐藏。

5.2.2 Frame time graph
5.2.2 帧时间图

The graph of the currently selected frame set (figure 15) provides an outlook on the time spent in each frame, allowing you to see where the problematic frames are and to navigate to them quickly.
当前所选帧集的图表(图 15)显示了每个帧所花费的时间,让您可以看到有问题的帧在哪里,并快速浏览到它们。
Figure 15: Frame time graph.
图 15:帧时间图。
Each bar displayed on the graph represents a unique frame in the current frame set . The progress of time is in the right direction. The bar height indicates the time spent in the frame, complemented by the color information, which depends on the target FPS value. You can set the desired FPS in the options menu (see section 5.4).
图表上显示的每个条形图都代表当前帧集 中的一个唯一帧。时间的进展方向是正确的。条形图的高度表示在帧中花费的时间,辅以取决于目标 FPS 值的颜色信息。您可以在选项菜单中设置所需的 FPS 值(参见第 5.4 节)。
  • If the bar is blue, then the frame met the best time of twice the target FPS (represented by the green target line).
    如果条形图为蓝色,则表示该帧达到了两倍于目标 FPS 的最佳时间(由绿色目标线表示)。
  • If the bar is green, then the frame met the good time of target FPS (represented by the yellow line).
    如果条形图是绿色的,则表示该帧达到了目标 FPS 的良好时间(用黄线表示)。
  • If the bar is yellow, then the frame met the bad time of half the FPS (represented by the red target line).
    如果条形图为黄色,则表示帧达到了一半 FPS 的坏时间(红色目标线表示)。
  • If the bar is red, then the frame didn't meet any time limits.
    如果条形图是红色的,则表示该帧不符合任何时间限制。
The frames visible on the timeline are marked with a violet box drawn over them.
时间轴上可见的帧用紫色方框标记。
When a zone is displayed in the find zone window (section 5.7), the coloring of frames may be changed, as described in section 5.7.2.
在查找区段窗口(第 5.7 节)中显示区段时,可根据第 5.7.2 节所述更改帧的着色。
Moving the mouse cursor over the frames displayed on the graph will display a tooltip with information about frame number, frame time, frame image (if available, see chapter 3.3.3), etc. Such tooltips are common for many UI elements in the profiler and won't be mentioned later in the manual.
将鼠标光标移至图表上显示的帧上,将显示一个工具提示,其中包含帧号、帧时间、帧图像(如果有,请参阅第 3.3.3 章)等信息。这种工具提示在剖析器的许多用户界面元素中都很常见,本手册后面将不再提及。
You may focus the timeline view on the frames by clicking or dragging the left mouse button on the graph. The graph may be scrolled left and right by dragging the right mouse button over the graph. Finally, you may zoom the view in and out by using the mouse wheel. If the view is zoomed out, so that multiple frames are merged into one column, the profiler will use the highest frame time to represent the given column.
单击或拖动图形上的鼠标左键,可将时间线视图聚焦到帧上。通过在图表上拖动鼠标右键,可以左右滚动图表。最后,还可以使用鼠标滚轮放大或缩小视图。如果视图被放大,多个帧合并为一列,剖析器将使用最高帧时间来表示给定列。
Clicking the left mouse button on the graph while the Ctrl key is pressed will open the frame image playback window (section 5.19) and set the playback to the selected frame. See section 3.3 .3 for more information about frame images.
按下 Ctrl 键的同时单击图形上的鼠标左键将打开帧图像回放窗口(第 5.19 节)并将回放设置为所选帧。有关帧图像的更多信息,请参阅第 3.3.3 节。

5.2.3 Timeline view 5.2.3 时间轴视图

The timeline is the most crucial element of the profiler UI. All the captured data is displayed there, laid out on the horizontal axis, according to time flow. Where there was no profiling performed, the timeline is dimmed out. The view is split into three parts: the time scale, the frame sets, and the combined zones, locks, and plots display.
时间轴是剖析器用户界面中最重要的元素。所有捕捉到的数据都显示在时间轴上,并根据时间流进行排列。如果没有进行剖析,时间轴就会变暗。视图分为三部分:时间刻度、帧集以及区域、锁定和绘图组合显示。
Collapsed items Due to extreme differences in time scales, you will almost constantly see events too small to be displayed on the screen. Such events have preset minimum size (so they can be seen) and are marked with a zig-zag pattern to indicate that you need to zoom in to see more detail.
折叠项目 由于时间尺度的极端差异,您几乎经常会看到屏幕上显示太小的事件。这些事件有预设的最小尺寸(以便可以看到),并用之字形图案标出,表示您需要放大才能看到更多细节。
The zig-zag pattern can be seen applied to frame sets on figure 17 , and zones on figure 18 .
从图 17 和图 18 中可以看出,"之 "字形图案适用于框架组和区域。

5.2.3.1 Time scale 5.2.3.1 时间尺度

The time scale is a quick aid in determining the relation between screen space and the time it represents (figure 16).
时间刻度可快速帮助确定屏幕空间与其所代表的时间之间的关系(图 16)。
Figure 16: Time scale.
图 16:时间尺度。
The leftmost value on the scale represents when the timeline starts. The rest of the numbers label the notches on the scale, with some numbers omitted if there's no space to display them.
刻度上最左边的数值代表时间线开始的时间。其余的数字表示刻度上的凹槽,如果没有空间显示,有些数字可以省略。
Hovering the mouse pointer over the time scale will display a tooltip with the exact timestamp at the position of the mouse cursor.
将鼠标指针悬停在时间刻度上会显示一个工具提示,在鼠标指针的位置显示精确的时间戳。

5.2.3.2 Frame sets 5.2.3.2 框架组

Frames from each frame set are displayed directly underneath the time scale. Each frame set occupies a separate row. The currently selected frame set is highlighted with bright colors, with the rest dimmed out.
每个帧组的帧都显示在时间刻度的正下方。每个帧集占一行。当前选中的帧集以明亮的颜色突出显示,其他帧集则变暗。
Figure 17: Frames on the timeline.
图 17:时间轴上的帧。
In figure 17 we can see the fully described frames 312 and 347. The description consists of the frame name, which is Frame for the default frame set (section 3.3) or the name you used for the secondary name set (section 3.3.1), the frame number, and the frame time. Since frame 348 is too small to be fully labeled, only the frame time is shown. On the other hand, frame 349 is even smaller, with no space for any text. Moreover, frames 313 to 346 are too small to be displayed individually, so they are replaced with a zig-zag pattern, as described in section 5.2.3.
在图 17 中,我们可以看到完整描述的帧 312 和 347。描述包括帧名称(默认帧集(第 3.3 节)的帧名称)或您用于辅助名称集(第 3.3.1 节)的名称)、帧编号和帧时间。由于第 348 帧太小,无法完全标注,因此只显示帧时间。另一方面,第 349 帧更小,没有任何文本空间。此外,由于第 313 至 346 帧太小,无法单独显示,因此如第 5.2.3 节所述,用 "之 "字形图案代替。
You can also see frame separators are projected down to the rest of the timeline view. Note that only the separators for the currently selected frame set are displayed. You can make a frame set active by clicking the
您还可以看到帧分隔符向下投射到时间线视图的其余部分。请注意,只显示当前选定帧集的分隔符。您可以通过单击
left mouse button on a frame set row you want to select (also see section 5.2.1).
在要选择的帧组行上按下鼠标左键(另请参阅第 5.2.1 节)。
Clicking the middle mouse button on a frame will zoom the view to the extent of the frame.
在框架上单击鼠标中键可将视图放大到框架的范围。
If a frame has an associated frame image (see chapter 3.3.3), you can hold the Ctrl key and click the left mouse button on the frame to open the frame image playback window (see chapter 5.19) and set the playback to the selected frame.
如果帧有相关的帧图像(见第 3.3.3 章),您可以按住 Ctrl 键并单击帧上的鼠标左键,打开帧图像回放窗口(见第 5.19 章),并将回放设置为所选帧。
If the Draw frame targets option is enabled (see section 5.4), time regions in frames exceeding the set target value will be marked with a red background.
如果启用了绘制帧目标值选项(见第 5.4 节),则帧中超过设定目标值的时间区域将以红色背景标记。

5.2.3.3 Zones, locks and plots display
5.2.3.3 分区、锁和绘图显示

You will find the zones with locks and their associated threads on this combined view. The plots are graphed right below.
您可以在该组合视图中找到带锁的区域及其相关线程。下图显示了这些图。
The left-hand side index area of the timeline view displays various labels (threads, locks), which can be categorized in the following way:
时间线视图左侧的索引区域显示各种标签(线程、锁),可按以下方式进行分类:
Figure 18: Zones and locks display.
图 18:区段和锁显示。
  • Light blue label - GPU context. Multi-threaded Vulkan, OpenCL, and Direct3D 12 contexts are additionally split into separate threads.
    浅蓝色标签 - GPU 上下文。此外,多线程 Vulkan、OpenCL 和 Direct3D 12 上下文还被分割成不同的线程。
  • Pink label - CPU data graph.
    粉色标签 - CPU 数据图。
  • White label - A CPU thread. It will be replaced by a bright red label in a thread that has crashed (section 2.5). If automated sampling was performed, clicking the left mouse button on the ghost zones button will switch zone display mode between 'instrumented' and 'ghost.'
    白色标签 - CPU 线程。在崩溃的线程中,该标签将被鲜红色标签取代(第 2.5 节)。如果执行了自动采样,单击幽灵区域按钮上的 鼠标左键将在 "已检测 "和 "幽灵 "之间切换区域显示模式。
  • Green label - Fiber, coroutine, or any other sort of cooperative multitasking 'green thread.'
    绿色标签 - Fiber、coroutine 或任何其他合作的多任务 "绿色线程"。
  • Light red label - Indicates a lock.
    浅红色标签 - 指示锁。
  • Yellow label - Plot.
    黄色标签 - 绘图。
Labels accompanied by the - symbol can be collapsed out of the view to reduce visual clutter. Hover the mouse pointer over the label to display additional information. Click the middle mouse button on a title to zoom the view to the extent of the label contents. Finally, click the right mouse button on a label to display the context menu with available actions:
带有"-"符号的标签可以从视图中折叠出来,以减少视觉干扰。将鼠标指针悬停在标签上可显示更多信息。在标题上单击鼠标中键,可将视图放大到标签内容的范围。最后,单击标签上的鼠标右键,显示包含可用操作的上下文菜单:
  • Hide - Hides the label along with the content associated to it. To make the label visible again, you must find it in the options menu (section 5.4).
    隐藏 - 隐藏标签及其相关内容。要重新显示标签,必须在选项菜单中找到它(第 5.4 节)。
Zones In an example in figure 18 you can see that there are two threads: Main thread and Streaming thread . We can see that the Main thread has two root level zones visible: Update and Render. The Update zone is split into further sub-zones, some of which are too small to be displayed at the current zoom level. This is indicated by drawing a zig-zag pattern over the merged zones box (section 5.2.3), with the number of collapsed zones printed in place of the zone name. We can also see that the Physics zone acquires the Physics lock mutex for most of its run time.
区域 在图 18 的示例中,您可以看到有两个线程:主线程和流线程 。我们可以看到,主线程有两个可见的根级区域:更新和渲染。更新区又分为多个子区,其中一些子区太小,无法在当前缩放级别下显示。这可以通过在合并区域框上绘制 "之 "字形图案来表示(第 5.2.3 节),并用折叠区域的数量代替区域名称。我们还可以看到,物理区在大部分运行时间内都会获取物理锁互斥。
Meanwhile, the Streaming thread is performing some Streaming jobs. The first Streaming job sent a message (section 3.7). In addition to being listed in the message log, it is indicated by a triangle over the thread separator. When multiple messages are in one place, the triangle outline shape changes to a filled triangle.
与此同时,流线程正在执行一些流作业。第一个流作业发送了一条消息(第 3.7 节)。除了在消息日志中列出外,它还在线程分隔符上用三角形表示。当多个报文同时出现时,三角形轮廓会变成填充三角形。
The GPU zones are displayed just like CPU zones, with an OpenGL/Vulkan/Direct3D/OpenCL context in place of a thread name.
GPU 区域的显示方式与 CPU 区域相同,用 OpenGL/Vulkan/Direct3D/OpenCL 上下文代替线程名称。
Hovering the mouse pointer over a zone will highlight all other zones that have the exact source location with a white outline. Clicking the left mouse button on a zone will open the zone information window (section 5.13). Holding the Ctrl key and clicking the left mouse button on a zone will open the zone statistics window (section 5.7). Clicking the middle mouse button on a zone will zoom the view to the extent of the zone.
将鼠标指针悬停在某一区段上,将以白色轮廓突出显示与该信号源位置相同的所有其他区段。单击某一区段上的鼠标左键将打开区段信息窗口 (第 5.13 节)。按住 Ctrl 键并在区段上单击鼠标左键,将打开区段统计窗口(第 5.7 节)。在区段上单击鼠标中键,将缩放视图至区段范围。
Ghost zones You can enable the view of ghost zones (not pictured on figure 18, but similar to standard zones view) by clicking on the ghost zones icon next to the thread label, available if automated sampling (see chapter 3.15.5) was performed. Ghost zones will also be displayed by default if no instrumented zones are available for a given thread to help with pinpointing functions that should be instrumented.
幽灵区 您可以通过单击线程标签旁边的幽灵区图标来启用幽灵区视图(图 18 中未显示,但与标准区视图 类似),如果执行了自动采样(参见第 3.15.5 章),则可启用幽灵区视图。如果给定线程没有可用的仪器区段,则默认情况下也会显示幽灵区段,以帮助精确定位应 用仪器检测的功能。
Ghost zones represent true function calls in the program, periodically reported by the operating system. Due to the limited sampling resolution, you need to take great care when looking at reported timing data. While it may be apparent that some small function requires a relatively long time to execute, for example, ( 8 kHz sampling rate), in reality, this time represents a period between taking two distinct samples, not the actual function run time. Similarly, two (or more) separate function calls may be represented as a single ghost zone because the profiler doesn't have the information needed to know about the actual lifetime of a sampled function.
幽灵区代表程序中的真实函数调用,由操作系统定期报告。由于采样分辨率有限,在查看报告的定时数据时需要非常小心。虽然从表面上看,某些小函数的执行时间相对较长,例如 (8 kHz 采样率),但实际上,这段时间代表的是两次不同采样之间的间隔时间,而不是函数的实际运行时间。同样,两个(或更多)独立的函数调用可能会被表示为一个幽灵区,因为剖析器没有所需的信息来了解采样函数的实际寿命。
Another common pitfall to watch for is the order of presented functions. It is not what you expect it to be! Read chapter 5.14 .1 for critical insight on how call stacks might seem nonsensical at first and why they aren't.
另一个需要注意的常见误区是功能呈现的顺序。它并不像你所期望的那样!请阅读第 5.14 .1 章,了解调用堆栈起初看似毫无道理以及为何并非如此的重要见解。
The available information about ghost zones is quite limited, but it's enough to give you a rough outlook on the execution of your application. The timeline view alone is more than any other statistical profiler can present. In addition, Tracy correctly handles inlined function calls, which are indicated by a darker background of ghost zones. Lastly, zones representing kernel-mode functions are displayed with red function names.
有关幽灵区的可用信息非常有限,但足以让你大致了解应用程序的执行情况。仅时间轴视图就比其他任何统计剖析器提供的信息都要多。此外,Tracy 还能正确处理内联函数调用,ghost 区域的背景颜色较深。最后,代表内核模式函数的区域会以红色函数名称显示。
Clicking the left mouse button on a ghost zone will open the corresponding source file location, if able (see chapter 5.16 for conditions). There are three ways in which source locations can be assigned to a ghost zone:
单击重影区上的鼠标左键将打开相应的源文件位置(如果可以)(有关条件,请参见第 5.16 章)。有三种方式可将源文件位置分配给重影区:
  1. If the selected ghost zone is not an inline frame and its symbol data has been retrieved, the source location points to the function entry location (first line of the function).
    如果所选重影区不是内联框架,且其符号数据已被检索,则源位置指向函数入口位置(函数的第一行)。
  2. If the selected ghost zone is not an inline frame, but its symbol data is not available, the source location will point to a semi-random location within the function body (i.e. to one of the sampled addresses in the program, but not necessarily the one representing the selected time stamp, as multiple samples with different addresses may be merged into one ghost zone).
    如果所选重影区不是内联帧,但其符号数据不可用,则源位置将指向函数体中的一个半随机位置(即程序中的一个采样地址,但不一定是代表所选时间戳的地址,因为不同地址的多个采样可能会合并为一个重影区)。
  3. If the selected ghost zone is an inline frame, the source location will point to a semi-random location within the inlined function body (see details in the above point). It is impossible to go to such a function's entry location, as it doesn't exist in the program binary. Inlined functions begin in the parent function.
    如果选择的重影区是内联框架,源位置将指向内联函数体中的一个半随机位置(详见上一点)。由于程序二进制中不存在这样的函数入口位置,因此不可能进入该函数的入口位置。内联函数从父函数开始。
Call stack samples The row of dots right below the Main thread label shows call stack sample points, which may have been automatically captured (see chapter 3.15 .5 for more detail). Hovering the mouse pointer over each dot will display a short call stack summary while clicking on the dot with the left mouse button will open a more detailed call stack information window (see section 5.14).
调用堆栈样本 主线程标签右下方的一排小点显示的是调用堆栈样本点,这些样本点可能已被自动捕获(详见第 3.15 .5 章)。将鼠标指针悬停在每个点上将显示简短的调用堆栈摘要,而用鼠标左键单击点将打开更详细的调用堆栈信息窗口(参见第 5.14 节)。
Context switches The thick line right below the samples represents context switch data (see section 3.15.3). We can see that the main thread, as displayed, starts in a suspended state, represented by the dotted region. Then it is woken up and starts execution of the Update zone. It is preempted amid the physics processing, which explains why there is an empty space between child zones. Then it is resumed again and continues execution into the Render zone, where it is preempted again, but for a shorter time. After rendering is done,
上下文切换 样本右下方的粗线代表上下文切换数据(参见第 3.15.3 节)。我们可以看到,如图所示,主线程开始时处于悬浮状态,用虚线区域表示。然后它被唤醒并开始执行 Update 区域。它在物理处理过程中被抢占,这就解释了为什么子区域之间会出现空白。然后,它再次被唤醒并继续执行到渲染区,在这里它再次被抢占,但时间更短。渲染完成后

the thread sleeps again, presumably waiting for the vertical blanking to indicate the next frame. Similar information is also available for the streaming thread.
线程再次休眠,可能是在等待垂直消隐指示下一帧。流媒体线程也有类似的信息。
Context switch regions are using the following color key:
上下文切换区域使用以下颜色键:
  • Green - Thread is running.
    绿色 - 线程正在运行。
  • Red - Thread is waiting to be resumed by the scheduler. There are many reasons why a thread may be in the waiting state. Hovering the mouse pointer over the region will display more information. If sampling was performed, the profiler might display a wait stack. See section 3.15.5.1 for additional details.
    红色 - 线程正在等待调度程序恢复。线程处于等待状态的原因有很多。将鼠标指针悬停在该区域上会显示更多信息。如果执行了采样,剖析器可能会显示等待堆栈。更多详情请参见第 3.15.5.1 节。
  • Blue - Thread is waiting to be resumed and is migrating to another CPU core. This might have visible performance effects because low-level CPU caches are not shared between cores, which may result in additional cache misses. To avoid this problem, you may pin a thread to a specific core by setting its affinity.
    蓝色 - 线程正在等待恢复,并迁移到另一个 CPU 内核。这可能会对性能产生明显影响,因为低级 CPU 缓存不在内核之间共享,这可能会导致额外的缓存缺失。为了避免这个问题,可以通过设置线程的亲和性,将线程固定在特定的内核上。
  • Bronze - Thread has been placed in the scheduler's run queue and is about to be resumed.
    Bronze - 线程已进入调度程序的运行队列,即将恢复运行。
Fiber work and yield states are presented in the same way as context switch regions.
光纤工作状态和产量状态的呈现方式与上下文切换区域相同。
CPU data This label is only available if the profiler collected context switch data. It is split into two parts: a graph of CPU load by various threads running in the system and a per-core thread execution display.
CPU 数据 该标签只有在剖析器收集了上下文切换数据时才可用。它分为两部分:系统中运行的各种线程的 CPU 负载图和每个内核线程执行情况显示。
The CPU load graph shows how much CPU resources were used at any given time during program execution. The green part of the graph represents threads belonging to the profiled application, and the gray part of the graph shows all other programs running in the system. Hovering the mouse pointer over the graph will display a list of threads running on the CPU at the given time.
CPU 负载图显示程序执行过程中任何给定时间内 CPU 资源的使用情况。图形的绿色部分表示属于被剖析应用程序的线程,灰色部分表示系统中运行的所有其他程序。将鼠标指针悬停在图表上,会显示给定时间内 CPU 上运行的线程列表。
Each line in the thread execution display represents a separate logical CPU thread. If CPU topology data is available (see section 3.15.4), package and core assignment will be displayed in brackets, in addition to numerical processor identifier (i.e. [package : core] CPU thread). When a core is busy executing a thread, a zone will be drawn at the appropriate time. Zones are colored according to the following key:
线程执行显示中的每一行都代表一个独立的逻辑 CPU 线程。如果有 CPU 拓扑数据(参见第 3.15.4 节),除了数字处理器标识符(即 [package : core] CPU 线程)外,软件包和内核分配将显示在括号中。当内核忙于执行线程时,将在适当的时间绘制区域。区域的颜色根据以下键值确定:
  • Bright color - or orange if dynamic thread colors are disabled - Thread tracked by the profiler.
    亮色--或橙色(如果动态线程颜色被禁用)--剖析器跟踪的线程。
  • Dark blue - Thread existing in the profiled application but not known to the profiler. This may include internal profiler threads, helper threads created by external libraries, etc.
    深蓝色 - 存在于已剖析应用程序中但剖析器不知道的线程。这可能包括内部剖析器线程、外部库创建的辅助线程等。
  • Gray - Threads assigned to other programs running in the system.
    灰色 - 分配给系统中运行的其他程序的线程。

will display a line connecting all zones associated with the selected thread. This can be used to quickly see how the thread migrated across the CPU cores.
将显示与所选线程相关的所有区域的连接线。这可用于快速查看线程在 CPU 内核间的迁移情况。
Clicking the left mouse button on a tracked thread will make it visible on the timeline if it was either hidden or collapsed before.
在跟踪的线程上单击鼠标左键,如果该线程之前是隐藏或折叠的,则会在时间轴上显示出来。
Careful examination of the data presented on this graph may allow you to determine areas where the profiled application was fighting for system resources with other programs (see section 2.2.1) or give you a hint to add more instrumentation macros.
仔细查看该图上显示的数据,可以确定被剖析应用程序与其他程序争夺系统资源的区域(参见第 2.2.1 节),或者为您添加更多仪器宏提供提示。
Locks Mutual exclusion zones are displayed in each thread that tries to acquire them. There are three color-coded kinds of lock event regions that may be displayed. Note that the contention regions are always displayed over the uncontented ones when the timeline view is zoomed out.
在每个试图获取锁的线程中,都会显示锁相互禁区。可以显示三种颜色编码的锁事件区域。请注意,当时间线视图放大时,竞争区域总是显示在无竞争区域之上。

- Green region - The lock is being held solely by one thread, and no other thread tries to access it. In the case of shared locks, multiple threads hold the read lock, but no thread requires a write lock. - Yellow region - The lock is being owned by this thread, and some other thread also wants to acquire the lock.
- 绿色区域 - 锁仅由一个线程持有,没有其他线程试图访问它。在共享锁的情况下,多个线程持有读锁,但没有线程需要写锁。- 黄色区域 - 锁由该线程拥有,其他线程也想获取该锁。

- Red region - The thread wants to acquire the lock but is blocked by other thread or threads in case of a shared lock.
- 红色区域 - 线程想要获取锁,但在共享锁的情况下被其他线程阻塞。
Hovering the mouse pointer over a lock timeline will highlight the lock in all threads to help read the lock behavior. Hovering the mouse pointer over a lock event will display important information, for example, a list of threads that are currently blocking or which are blocked by the lock. Clicking the
将鼠标指针悬停在锁定时间轴上会高亮显示所有线程中的锁定,以帮助读取锁定行为。将鼠标指针悬停在锁定事件上会显示重要信息,例如当前阻塞或被锁定阻塞的线程列表。点击
left mouse button on a lock event or a lock label will open the lock information window, as described in section 5.18. Clicking the middle mouse button on a lock event will zoom the view to the extent of the event.
在锁定事件或锁定标签上单击鼠标左键将打开锁定信息窗口,详见第 5.18 节。在锁定事件上单击鼠标中键将缩放视图至事件范围。
Plots The numerical data values (figure 19) are plotted right below the zones and locks. Note that the minimum and maximum values currently displayed on the plot are visible on the screen, along with the range of the plot and the number of drawn data points. The discrete data points are indicated with little rectangles. A filled rectangle indicates multiple data points.
绘图 数字数据值(图 19)绘制在区段和锁的正下方。请注意,当前显示在绘图上的最小值和最大值,以及绘图的 范围和绘制的数据点数量都可以在屏幕上看到。离散数据点用小矩形表示。填充矩形表示多个数据点。
Figure 19: Plot display.
图 19:绘图显示。
When memory profiling (section 3.8) is enabled, Tracy will automatically generate a Memory usage plot, which has extended capabilities. For example, hovering over a data point (memory allocation event) will visually display the allocation duration. Clicking the left mouse button on the data point will open the memory allocation information window, which will show the duration of the allocation as long as the window is open.
启用内存剖析(第 3.8 节)后,Tracy 将自动生成具有扩展功能的 内存使用图。例如,将鼠标悬停在数据点(内存分配事件)上将直观显示分配持续时间。单击数据点上的鼠标左键将打开内存分配信息窗口,只要窗口打开,就会显示分配的持续时间。
Another plot that Tracy automatically provides is the CPU usage plot, which represents the total system CPU usage percentage (it is not limited to the profiled application).
Tracy 自动提供的另一个图表是 CPU 使用率图表,它表示系统 CPU 总使用百分比(不限于剖析应用程序)。

5.2.4 Navigating the view
5.2.4 浏览视图

Hovering the mouse pointer over the timeline view will display a vertical line that you can use to line up events in multiple threads visually. Dragging the left mouse button will display the time measurement of the selected region.
将鼠标指针悬停在时间线视图上会显示一条垂直线,您可以用它将多个线程中的事件直观地排列起来。拖动鼠标左键将显示所选区域的时间测量值。
The timeline view may be scrolled both vertically and horizontally by dragging the right mouse button Note that only the zones, locks, and plots scroll vertically, while the time scale and frame sets always stay on the top.
请注意,只有区域、锁定和绘图可以垂直滚动,而时间刻度和帧集始终保持在顶部。
You can zoom in and out the timeline view by using the mouse wheel. Pressing the Ctrl key will make zooming more precise while pressing the key will make it faster. You can select a range to which you want to zoom in by dragging the middle mouse button. Dragging the middle mouse button while the Ctrl key is pressed will zoom out.
使用鼠标滚轮可以放大或缩小时间线视图。按下 Ctrl 键会使缩放更精确,而按下 键会使缩放更快速。通过拖动鼠标中键,可以选择要放大的范围。按住 Ctrl 键的同时拖动鼠标中键将会缩小。
It is also possible to navigate the timeline using the keyboard. The and keys scroll the view to the left and right, respectively. The and keys change the zoom level.
还可以使用键盘浏览时间轴。 键分别向左和向右滚动视图。 键可改变缩放级别。

5.3 Time ranges 5.3 时间范围

Sometimes, you may want to specify a time range, such as limiting some statistics to a specific part of your program execution or marking interesting places.
有时,您可能需要指定一个时间范围,例如将某些统计数据限制在程序执行的特定部分,或标记有趣的地方。
To define a time range, drag the left mouse button over the timeline view while holding the Ctrl key. When the mouse key is released, the profiler will mark the selected time extent with a blue striped pattern, and it will display a context menu with the following options:
要定义时间范围,请按住 Ctrl 键的同时在时间线视图上拖动鼠标左键。松开鼠标键后,剖析器将用蓝色条纹图案标记所选时间范围,并显示包含以下选项的上下文菜单:
  • Q Limit find zone time range - this will limit find zone results. See chapter 5.7 for more details.
    限制查找区域的时间范围 - 这将限制查找区域的结果。详见第 5.7 章。
  • †ミ Limit statistics time range - selecting this option will limit statistics results. See chapter 5.6 for more details.
    限制统计时间范围 - 选择该选项将限制统计结果。详见第 5.6 章。
  • Limit wait stacks time range - limits wait stacks results. Refer to chapter 5.17.
    限制等待堆栈时间范围 - 限制等待堆栈结果。请参阅第 5.17 章。
  • ․․․ Limit memory time range - limits memory results. Read more about this in chapter 5.9.
    限制内存时间范围 - 限制内存结果。有关这方面的更多信息,请参阅第 5.9 章。
  • Add annotation - use to annotate regions of interest, as described in chapter 5.3.1.
    添加注释 - 用于注释感兴趣的区域,如 5.3.1 章所述。
Alternatively, you may specify the time range by clicking the right mouse button on a zone or a frame. The resulting time extent will match the selected item.
或者,您也可以在区域或帧上单击鼠标右键来指定时间范围。由此产生的时间范围将与所选项目相匹配。
To reduce clutter, time range regions are only displayed if the windows they affect are open or if the time range limits control window is open (section 5.23). You can access the time range limits window through the * Tools button on the control menu.
为减少杂乱,时间范围区域只有在其影响的窗口打开或时间范围限制控制窗口打开时才会显示(第 5.23 节)。您可以通过控制菜单上的 * 工具按钮访问时间范围限制窗口。
You can freely adjust each time range on the timeline by clicking the left mouse button on the range's edge and dragging the mouse.
在时间轴上的每个时间范围上单击鼠标左键并拖动鼠标,即可自由调整时间范围。

5.3.1 Annotating the trace
5.3.1 标注轨迹

Tracy allows adding custom notes to the trace. For example, you may want to mark a region to ignore because the application was out-of-focus or a region where a new user was connecting to the game, which resulted in a frame drop that needs to be investigated.
Tracy 允许在跟踪中添加自定义注释。例如,您可能想标记一个区域忽略不计,因为应用程序失去了焦点;或者标记一个区域,因为有新用户连接到游戏,导致帧数下降,需要进行调查。
Methods of specifying the annotation region are described in section 5.3. When a new annotation is added, a settings window is displayed (section 5.21), allowing you to enter a description.
指定注释区域的方法将在第 5.3 节中介绍。添加新注释时,会显示一个设置窗口(第 5.21 节),允许您输入说明。
Annotations are displayed on the timeline, as presented in figure 20. Clicking on the circle next to the text description will open the annotation settings window, in which you can modify or remove the region. List of all annotations in the trace is available in the annotations list window described in section 5.22 , which is accessible through the Tools button on the control menu.
注释显示在时间轴上,如图 20 所示。点击文字描述旁边的圆圈将打开注释设置窗口,您可以修改或删除该区域。轨迹中所有注释的列表可在 5.22 节所述的注释列表窗口中找到,该窗口可通过控制菜单上的 工具按钮进入。

Description 说明

Figure 20: Annotation region.
图 20:注释区域。
Please note that while the annotations persist between profiling sessions, they are not saved in the trace but in the user data files, as described in section 8.2.
请注意,虽然注释会在剖析会话之间持续存在,但它们不会保存在跟踪中,而是保存在用户数据文件中,详见第 8.2 节。

5.4 Options menu 5.4 选项菜单

In this window, you can set various trace-related options. For example, the timeline view might sometimes become overcrowded, in which case disabling the display of some profiling events can increase readability.
在该窗口中,您可以设置各种与跟踪相关的选项。例如,时间线视图有时可能会变得过于拥挤,在这种情况下,禁用某些剖析事件的显示可以提高可读性。
  • Draw empty labels - By default threads that don't have anything to display at the current zoom level are hidden. Enabling this option will show them anyway.
    绘制空标签 - 默认情况下,在当前缩放级别没有任何内容显示的线程将被隐藏。启用此选项将显示它们。
  • Draw frame targets - If enabled, time regions in any frame from the currently selected frame set, which exceed the specified Target FPS value will be marked with a red background on timeline view.
    绘制帧目标 - 如果启用,当前所选帧集中任何帧中超过指定目标 FPS 值的时间区域都将在时间线视图上以红色背景标记。
  • Target FPS - Controls the option above, but also the frame bar colors in the frame time graph (section 5.2.2). The color range thresholds are presented in a line directly below.
    目标 FPS - 控制上述选项,同时也控制帧时间图中的帧条颜色(第 5.2.2 节)。颜色范围阈值显示在正下方的一行中。
  • si் Draw context switches - Allows disabling context switch display in threads.
    si் Draw context switches - 允许在线程中禁用上下文切换显示。
  • C Darken inactive thread - If enabled, inactive regions in threads will be dimmed out.
    C 使不活动的线程变暗 - 如果启用,线程中不活动的区域将变暗。
  • Draw CPU data - Per-CPU behavior graph can be disabled here.
    绘制 CPU 数据 - 这里可以禁用每 CPU 行为图。
  • Draw CPU usage graph - You can disable drawing of the CPU usage graph here.
    绘制 CPU 使用情况图 - 您可以在此处禁用绘制 CPU 使用情况图。
  • Draw GPU zones - Allows disabling display of OpenGL/Vulkan/Direct3D/OpenCL zones. The GPU zones drop-down allows disabling individual GPU contexts and setting CPU/GPU drift offsets of uncalibrated contexts (see section 3.9 for more information). The Auto button automatically measures the GPU drift value .
    绘制 GPU 区域 - 允许禁用 OpenGL/Vulkan/Direct3D/OpenCL 区域的显示。GPU 区域下拉菜单允许禁用单个 GPU 上下文和设置未校准上下文的 CPU/GPU 漂移偏移(更多信息请参见第 3.9 节)。自动 "按钮可自动测量 GPU 漂移值
  • Draw CPU zones - Determines whether CPU zones are displayed.
    绘制 CPU 区段 - 确定是否显示 CPU 区段。
  • Draw ghost zones - Controls if ghost zones should be displayed in threads which don't have any instrumented zones available.
    绘制重影区 - 控制是否在没有可用仪器区的线程中显示重影区。
  • Zone colors - Zones with no user-set color may be colored according to the following schemes:
    区段颜色 - 没有用户设置颜色的区段可以按照以下方案着色:
  • Disabled - A constant color (blue) will be used.
    禁用 - 将使用恒定颜色(蓝色)。
  • Thread dynamic - Zones are colored according to a thread (identifier number) they belong to and depth level.
    线程动态 - 区域根据其所属的线程(标识符编号)和深度级别着色。
  • Source location dynamic - Zone color is determined by source location (function name) and depth level.
    源位置动态 - 区域颜色由源位置(功能名称)和深度级别决定。
Enabling the Ignore custom option will force usage of the selected zone coloring scheme, disregarding any colors set by the user in profiled code.
启用 "忽略自定义 "选项将强制使用选定的区域着色方案,而不考虑用户在配置代码中设置的任何颜色。
  • Zone name shortening - controls display behavior of long zone names, which don't fit inside a zone box:
    区段名称缩写 - 控制长区段名称的显示行为,这些名称不适合显示在区段框内:
  • Disabled - Shortening of zone names is not performed and names are always displayed in full (e.g. bool ns::container::add(const float&)).
    禁用 - 不缩短区域名称,名称始终以全名显示(例如 bool ns::container::add(const float&))。
  • Minimal length - Always reduces zone name to minimal length, even if there is space available for a longer form (e.g. add()).
    最短长度 - 始终将区域名称缩减到最短长度,即使有空间可以使用更长的形式(例如 add())。
  • Only normalize - Only performs normalization of the zone name , but does not remove namespaces (e.g. ns: :container<> : :add()).
    仅规范化 - 仅对区段名称 执行规范化,但不删除命名空间(例如 ns: :container<> : :add())。
  • As needed - Name shortening steps will be performed only if there is no space to display a complete zone name, and only until the name fits available space, or shortening is no longer possible (e.g. container<> : : add()).
    根据需要 - 只有在没有空间显示完整的区段名称时,才会执行名称缩写步骤,直到名称适合可用空间,或不再可能缩写(例如,container<> : : add())。
  • As needed + normalize - Same as above, but zone name normalization will always be performed, even if the entire zone name fits in the space available.
    根据需要 + 归一化 - 同上,但区段名称归一化总是要执行的,即使整个区段名称都在可用空间内。
Function names in the remaining places across the UI will be normalized unless this option is set to Disabled. - Draw locks - Controls the display of locks. If the Only contended option is selected, the profiler won't display the non-blocking regions of locks (see section 5.2.3.3). The Locks drop-down allows disabling the display of locks on a per-lock basis. As a convenience, the list of locks is split into the single-threaded and multi-threaded (contended and uncontended) categories. Clicking the right mouse button on a lock label opens the lock information window (section 5.18).
除非将此选项设置为禁用,否则整个用户界面其余地方的函数名称都将正常化。- 绘制锁 - 控制锁的显示。如果选择 "仅有竞争 "选项,剖析器将不显示锁的非阻塞区域(参见第 5.2.3.3 节)。锁下拉菜单允许禁用按锁显示锁。为方便起见,锁列表分为单线程和多线程(有竞争和无竞争)两类。在锁标签上单击鼠标右键可打开锁信息窗口(第 5.18 节)。

- Draw plots - Allows disabling display of plots. Individual plots can be disabled in the Plots drop-down. The vertical size of the plots can be adjusted using the Plot heights slider.
- 绘制图形 - 允许禁用图形显示。可以在 "绘图 "下拉菜单中禁用单个绘图。可以使用 "绘图高度 "滑块调整绘图的垂直尺寸。

- Visible threads - Here you can select which threads are visible on the timeline. You can change the display order of threads by dragging thread labels. Threads can be sorted alphabetically with the Sort button.
- 可见线程 - 在这里,您可以选择哪些线程在时间轴上可见。您可以通过拖动线程标签来更改线程的显示顺序。可以使用排序按钮按字母顺序对线程进行排序。

- Visible frame sets - Frame set display can be enabled or disabled here. Note that disabled frame sets are still available for selection in the frame set selection drop-down (section 5.2.1) but are marked with a dimmed font.
- 可见帧集 - 可以在此处启用或禁用帧集显示。请注意,禁用的帧集仍可在帧集选择下拉菜单中选择(第 5.2.1 节),但字体会变暗。
Disabling the display of some events is especially recommended when the profiler performance drops below acceptable levels for interactive usage.
当剖析器性能下降到交互式使用可接受的水平以下时,尤其建议禁用某些事件的显示。

5.5 Messages window 5.5 信息窗口

In this window, you can see all the messages that were sent by the client application, as described in section 3.7. The window is split into four columns: time, thread, message and call stack. Hovering the mouse cursor over a message will highlight it on the timeline view. Clicking the left mouse button on a message will center the timeline view on the selected message.
在该窗口中,您可以看到客户端程序发送的所有信息,如第 3.7 节所述。窗口分为四列:时间、线程、消息和调用栈。将鼠标光标悬停在消息上会在时间轴视图中突出显示该消息。单击消息上的鼠标左键将使时间线视图居中显示所选消息。
The call stack column is filled only if a call stack capture was requested, as described in section 3.11. A single entry consists of the Show button, which opens the call stack information window (chapter 5.14) and of abbreviated information about the call path.
调用堆栈一栏只有在请求调用堆栈捕获(如第 3.11 节所述)时才会填写。单个条目由 显示按钮和调用路径的简短信息组成, 显示按钮可打开调用堆栈信息窗口(5.14 章)。
If the Show frame images option is selected, hovering the mouse cursor over a message will show a tooltip containing frame image (see section 3.3.3) associated with a frame in which the message was issued, if available.
如果选择了 显示帧图像选项,将鼠标光标悬停在报文上将显示一个工具提示,其中包含与发布报文的帧相关的帧图像(见第 3.3.3 节)(如果有的话)。
The message list will automatically scroll down to display the most recent message during live capture. You can disable this behavior by manually scrolling the message list up. The auto-scrolling feature will be enabled again when the view is scrolled down to display the last message.
在实时捕捉过程中,信息列表会自动向下滚动以显示最近的信息。您可以通过手动向上滚动信息列表来禁用这种行为。当视图向下滚动显示最后一条信息时,自动滚动功能将再次启用。
You can filter the message list in the following ways:
您可以通过以下方式过滤信息列表:
  • By the originating thread in the 5 Visible threads drop-down.
    在 5 个可见线程下拉菜单中选择发起线程。
  • By matching the message text to the expression in the Filter messages entry field. Multiple filter expressions can be comma-separated (e.g. 'warn, info' will match messages containing strings 'warn' or 'info'). You can exclude matches by preceding the term with a minus character (e.g.,'-debug' will hide all messages containing the string 'debug').
    将邮件文本与 筛选邮件输入字段中的表达式进行匹配。多个过滤表达式可以逗号分隔(例如,"warn, info "将匹配包含字符串 "warn "或 "info "的邮件)。您还可以通过在术语前添加减号字符来排除匹配(例如,'-debug'将隐藏所有包含字符串'debug'的邮件)。

5.6 Statistics window 5.6 统计窗口

Looking at the timeline view gives you a very localized outlook on things. However, sometimes you want to look at the general overview of the program's behavior. For example, you want to know which function takes the most of the application's execution time. The statistics window provides you with exactly that information.
查看时间轴视图可以让你对事物有一个非常局部的了解。不过,有时您想查看程序行为的总体概况。例如,你想知道哪个函数占用了程序最多的执行时间。统计窗口就能为你提供这方面的信息。
If the trace capture was performed with call stack sampling enabled (as described in chapter 3.15.5), you will be presented with an option to switch between Instrumentation and Sampling modes. If the profiler collected no sampling data, but it retrieved symbols, the second mode will be displayed as Symbols, enabling you to list available symbols.
如果跟踪捕获是在启用调用堆栈采样的情况下进行的(如第 3.15.5 章所述),您将看到在 "仪器 "和 "采样 "模式之间切换的选项。如果剖析器未收集采样数据,但检索了符号,则第二种模式将显示为 "符号"(Symbols),以便列出可用符号。
If GPU zones were captured, you would also have the GPU option to view the GPU zones statistics.
如果捕获了 GPU 区域,还可以使用 GPU 选项查看 GPU 区域统计信息。

5.6.1 Instrumentation mode
5.6.1 仪表模式

Here you will find a multi-column display of captured zones, which contains: the zone name and location, total time spent in the zone, the count of zone executions, the mean time spent in the zone per call and the number of threads the zone has appeared in, labeled with a thead icon. You may sort the view according to the four displayed values or by the name.
在这里,您将看到捕获的区段的多列显示,其中包含:区段名称和位置、在区段中花费的总时间、区段执行次数、每次调用在区段中花费的平均时间以及区段出现过的线程数,并标有图标。您可以根据显示的四个值或名称对视图进行排序。
In the Timing menu, the With children selection displays inclusive measurements, that is, containing execution time of zone's children. The Self only selection switches the measurement to exclusive, displaying just the time spent in the zone, subtracting the child calls. Finally, the Non-reentrant selection shows inclusive time but counts only the first appearance of a given zone on a thread's stack.
在计时菜单中,"带子区 "选择显示包含的测量值,即包含区的子区的执行时间。Self only(仅自身)选项将测量转换为排他性测量,只显示在区域中花费的时间,减去子调用时间。最后,"Non-reentrant"(非重复)选项显示的是包含时间,但只计算特定区域在线程堆栈中首次出现的时间。
Clicking the left mouse button on a zone will open the individual zone statistics view in the find zone window (section 5.7).
在区段上单击鼠标左键,将在查找区段窗口中打开单个区段统计视图(第 5.7 节)。
You can filter the displayed list of zones by matching the zone name to the expression in the Filter zones entry field. Refer to section 5.5 for a more detailed description of the expression syntax.
您可以通过匹配区段名称和 筛选区段输入字段中的表达式来筛选显示的区段列表。有关表达式语法的详细说明,请参阅第 5.5 节。
To limit the statistics to a specific time extent, you may enable the Limit range option (chapter 5.3). The inclusion region will be marked with a red striped pattern. Note that a zone must be entirely inside the region to be counted. You can access more options through the Limits button, which will open the time range limits window, described in section 5.23.
要将统计数据限制在特定时间范围内,可以启用限制范围选项(5.3 章)。包含区域将用红色条纹标记。请注意,一个区域必须完全位于该区域内才能被统计。您可以通过 限制按钮访问更多选项,该按钮将打开时间范围限制窗口,详见第 5.23 节。

5.6.2 Sampling mode 5.6.2 采样模式

Data displayed in this mode is, in essence, very similar to the instrumentation one. Here you will find function names, their locations in source code, and time measurements. There are, however, some significant differences.
该模式下显示的数据本质上与仪器模式非常相似。您可以在这里找到函数名称、它们在源代码中的位置以及时间测量值。不过,它们之间也有一些明显的区别。
First and foremost, the presented information is constructed from many call stack samples, which represent real addresses in the application's binary code, mapped to the line numbers in the source files. This reverse mapping may not always be possible or could be erroneous. Furthermore, due to the nature of the sampling process, it is impossible to obtain exact time measurements. Instead, time values are guesstimated by multiplying the number of sample counts by mean time between two different samples.
首先,所提供的信息是由许多调用堆栈样本构建的,这些样本代表应用程序二进制代码中的真实地址,并映射到源文件中的行号。这种反向映射不一定总是可行的,也可能是错误的。此外,由于采样过程的性质,不可能获得精确的时间测量值。取而代之的是通过将采样次数乘以两个不同采样之间的平均时间来估算时间值。
The sample statistics list symbols, not functions. These terms are similar, but not exactly the same. A symbol always has a base function that gives it its name. In most cases, a symbol will also contain a number of inlined functions. In some cases, the same function may be inlined more than once within the same symbol.
样本统计列出的是符号,而不是函数。这些术语相似,但并不完全相同。一个符号总是有一个基本函数来命名。在大多数情况下,一个符号还包含多个内联函数。在某些情况下,同一个符号中的同一个函数可能会被内联多次。
The Name column contains name of the symbol in which the sampling was done. Kernel-mode symbol samples are distinguished with the red color. Symbols containing inlined functions are listed with the number of inlined functions in parentheses and can be expanded to show all inlined functions (some functions may be hidden if the Show all option is disabled due to lack of sampling data). Clicking on a function name will open the sample entry call stacks window (see chapter 5.15 .
名称 "列包含采样的符号名称。内核模式符号样本用红色区分。包含内联函数的符号会在括号中列出内联函数的数量,并可展开显示所有内联函数(如果由于缺乏采样数据而禁用了显示全部选项,则某些函数可能会被隐藏)。点击函数名称将打开样本条目调用堆栈窗口(参见第 5.15 章
By default, each inlining of a function is listed separately. If you prefer to combine the measurements for functions that are inlined multiple times within a function, you can do so by enabling the Aggregate option. You cannot view sample entry call stacks of inlined functions when this grouping method is enabled.
默认情况下,函数的每次内联都会单独列出。如果希望合并一个函数中多次内联的函数的测量值,可以启用 "汇总"(Aggregate)选项。启用该分组方法后,将无法查看内联函数的样本条目调用堆栈。
If the Inlines option is enabled, the list will show all functions without grouping them by symbol. In this mode, inline functions are preceded by a symbol and their parent function name is displayed in parentheses.
如果启用了内联选项,列表将显示所有函数,而不按符号分组。在这种模式下,内联函数前面会有一个符号,其父函数名称会显示在括号中。
The Location column displays the corresponding source file name and line number. Depending on the Location option selection, it can either show the function entry address or the instruction at which the sampling was performed. The Entry mode points at the beginning of a non-inlined function or at the place where the compiler inserted an inlined function in its parent function. The Sample mode is not useful for non-inlined functions, as it points to one randomly selected sampling point out of many that were captured. However, in the case of inlined functions, this random sampling point is within the inlined function body.
位置栏显示相应的源文件名称和行号。根据位置选项的选择,它可以显示函数入口地址或执行采样的指令。输入模式指向非内联函数的起始位置或编译器在父函数中插入内联函数的位置。采样模式对非内联函数没有用处,因为它指向的是从许多采样点中随机选择的一个采样点。但对于内联函数,随机采样点就在内联函数体中。
Using these options in tandem lets you look at both the inlined function code and the place where it was inserted. If the Smart location is selected, the profiler will display the entry point position for non-inlined functions and sample location for inlined functions. Selecting the @ Address option will instead print the symbol address.
同时使用这些选项,可以同时查看内联函数代码和插入代码的位置。如果选择智能位置,剖析器将显示非内联函数的入口点位置和内联函数的示例位置。选择 @ 地址选项则会打印符号地址。
The location data is complemented by the originating executable image name, contained in the Image column.
位置数据由 "图像 "列中的原始可执行图像名称补充。
The profiler may not find some function locations due to insufficient debugging data available on the client-side. To filter out such entries, use the Hide unknown option.
由于客户端调试数据不足,剖析器可能找不到某些函数位置。要过滤掉此类条目,请使用隐藏未知选项。
The Time or Count column (depending on the Show time option selection) shows number of taken samples, either as a raw count, or in an easier to understand time format. Note that the percentage value of time is calculated relative to the wall-clock time. The percentage value of sample counts is relative to the total number of collected samples. You can also make the percentages of inline functions relative to the base symbol measurements by enabling the Base relative option.
时间或计数列(取决于显示时间选项的选择)显示采集的样本数,可以是原始计数,也可以是更容易理解的时间格式。请注意,时间的百分比值是相对于挂钟时间计算的。样本计数的百分比值是相对于采集的样本总数而言的。您还可以启用 基准相对选项,使内联函数的百分比相对于基准符号测量值。
The last column, Code size, displays the size of the symbol in the executable image of the program. Since inlined routines are directly embedded into other functions, their symbol size will be based on the parent symbol and displayed as 'less than'. In some cases, this data won't be available. If the symbol code has been retrieved symbol size will be prepended with the icon, and clicking the right mouse button on the location column entry will open symbol view window (section 5.16.2).
最后一栏 "代码大小 "显示程序可执行映像中符号的大小。由于内联例程是直接嵌入到其他函数中的,因此其符号大小将以父符号为基础,并显示为 "小于"。在某些情况下,这些数据是不可用的。如果已检索到 符号代码,则符号大小将以图标形式显示,在位置列条目上单击鼠标右键将打开符号视图窗口(第 5.16.2 节)。
Finally, the list can be filtered using the Filter symbols entry field, just like in the instrumentation mode case. Additionally, you can also filter results by the originating image name of the symbol. You may disable the display of kernel symbols with the Include kernel switch. The exclusive/inclusive time counting mode can be switched using the Timing menu (non-reentrant timing is not available in the Sampling view). Limiting the time range is also available but is restricted to self-time. If the Show all option is selected, the list will include not only the call stack samples but also all other symbols collected during the profiling process (this is enabled by default if no sampling was performed).
最后,可以使用 过滤符号输入字段对列表进行过滤,就像在仪器模式下一样。此外,还可以根据符号的源图像名称过滤结果。您可以使用 包含内核开关禁用内核符号的显示。可以通过计时菜单切换独占/独占计时模式(采样视图中不提供非重复计时)。还可以限制时间范围,但仅限于自时间。如果选择 "显示全部 "选项,则列表中不仅包括调用堆栈样本,还包括剖析过程中收集的所有其他符号(如果未执行采样,则默认启用)。
A simple CSV document containing the visible zones after filtering and limiting can be copied to the clipboard with the button adjacent to the visible zones count. The document contains the following columns:
可通过可见区域计数旁的按钮将包含过滤和限制后可见区域的简单 CSV 文档复制到剪贴板。该文档包含以下列
  • name - Zone name
    name - 区域名称
  • src_file - Source file where the zone was set
    src_file - 设置区域的源文件
  • src_line - Line in the source file where the zone was set
    src_line - 源文件中设置区段的行
  • total_ns - Total zone time in nanoseconds
    total_ns - 以纳秒为单位的总区域时间
  • counts - Zone count
    计数 - 区计数

5.6.3 GPU zones mode
5.6.3 GPU 区域模式

This is an analog of the instrumentation mode, but for the GPU zones. Note that the available options may be limited here.
这是仪器模式的一种模拟,但适用于 GPU 区域。请注意,这里的可用选项可能有限。

5.7 Find zone window
5.7 查找区段窗口

The individual behavior of zones may be influenced by many factors, like CPU cache effects, access times amortized by the disk cache, thread context switching, etc. Moreover, sometimes the execution time depends on the internal data structures and their response to different inputs. In other words, it is hard to determine the actual performance characteristics by looking at any single zone.
分区的单个行为可能受到许多因素的影响,如 CPU 缓存效应、磁盘缓存摊销的访问时间、线程上下文切换等。此外,有时执行时间还取决于内部数据结构及其对不同输入的响应。换句话说,很难通过查看任何单一区域来确定实际性能特征。
Tracy gives you the ability to display an execution time histogram of all occurrences of a zone. On this view, you can see how the function behaves in general. You can inspect how various data inputs influence the execution time. You can filter the data to eventually drill down to the individual zone calls to see the environment in which they were called.
Tracy 提供了显示区域所有执行时间柱状图的功能。在此视图中,您可以看到函数的总体运行情况。您可以查看各种数据输入如何影响执行时间。您可以过滤数据,最终深入到单个区域调用,查看调用环境。
You start by entering a search query, which will be matched against known zone names (see section 3.4 for information on the grouping of zone names). If the search found some results, you will be presented with a list of zones in the matched source locations drop-down. The selected zone's graph is displayed on the histogram drop-down, and also the matching zones are highlighted on the timeline view.
首先输入一个搜索查询,然后与已知的区段名称进行匹配(有关区段名称分组的信息,请参阅第 3.4 节)。如果搜索有结果,则会在匹配的源位置下拉菜单中显示区段列表。所选区域的图表会显示在直方图下拉菜单中,匹配的区域也会在时间轴视图中突出显示。
Clicking the right mouse button on the source file location will open the source file view window (if applicable, see section 5.16). If symbol data is available Tracy will try to match the instrumented zone name to a captured symbol. If this succeeds and there are no duplicate matches, the source file view will be accompanied by the disassembly of the code. Since this matching is not exact, in rare cases you may get the wrong data here. To just display the source code, press and hold the Ctrl key while clicking the right mouse button.
单击源文件位置上的鼠标右键将打开源文件视图窗口(如适用,请参见第 5.16 节)。如果有符号数据,Tracy 会尝试将仪器区域名称与捕捉到的符号进行匹配。如果匹配成功且没有重复匹配,源文件视图将伴随代码的反汇编。由于这种匹配并不精确,在极少数情况下,您可能会在这里获得错误的数据。如果只想显示源代码,请在点击鼠标右键的同时按住 Ctrl 键。
An example histogram is presented in figure 21. Here you can see that the majority of zone calls (by count) are clustered in the 300 ns group, closely followed by the cluster. There are some outliers at the 1 and 10 ms marks, which can be ignored on most occasions, as these are single occurrences.
图 21 展示了一个柱状图示例。从图中可以看出,大部分区域调用(按次数计算)集中在 300 毫微秒组,紧随其后的是 组。在 1 毫秒和 10 毫秒处有一些离群值,由于这些都是单次出现,因此在大多数情况下可以忽略不计。
Figure 21: Zone execution time histogram. Note that the extreme time labels and time range indicator (middle time value) are displayed in a separate line.
图 21:区域执行时间柱状图。请注意,极端时间标签和时间范围指标(中间时间值)显示在单独一行中。
Various data statistics about displayed data accompany the histogram, for example, the total time of the displayed samples or the maximum number of counts in histogram bins. The following options control how the data is presented:
直方图中还会显示有关显示数据的各种数据统计,例如显示样本的总时间或直方图分段中的最大计数。以下选项可控制数据的显示方式:

- Log values - Switches between linear and logarithmic scale on the y axis of the graph, representing the call counts .
- 对数值 - 在图形 y 轴的线性和对数刻度之间切换,表示呼叫次数

- Log time - Switches between linear and logarithmic scale on the axis of the graph, representing the time bins.
- 对数时间 - 在图形 轴上的线性和对数刻度之间切换,代表时间分段。

- Cumulate time - Changes how the histogram bin values are calculated. By default, the vertical bars on the graph represent the call counts of zones that fit in the given time bin. If this option is enabled, the bars represent the time spent in the zones. For example, on the graph presented in figure 21 the cluster is the dominating one, if we look at the time spent in the zone, even if the 300 ns cluster has a greater number of call counts.
- 累积时间 - 更改直方图分区值的计算方式。默认情况下,图表上的垂直条代表符合给定时间区间的区段呼叫次数。如果启用此选项,则条形图表示在区段中花费的时间。例如,在图 21 所示的图表中,如果我们查看在区段中花费的时间, 群组是最主要的群组,即使 300 ns 群组的呼叫次数更多。

- Self time - Removes children time from the analyzed zones, which results in displaying only the time spent in the zone itself (or in non-instrumented function calls). It cannot be selected when Running time is active.
- 自身时间 - 从分析区段中删除子区段时间,从而只显示在区段自身(或非仪器功能调用) 中花费的时间。运行时间 "激活时无法选择。

- Running time - Removes time when zone's thread execution was suspended by the operating system due to preemption by other threads, waiting for system resources, lock contention, etc. Available only when the profiler performed context switch capture (section 3.15.3). It cannot be selected when Self time is active. - Minimum values in bin - Excludes display of bins that do not hold enough values at both ends of the time range. Increasing this parameter will eliminate outliers, allowing us to concentrate on the interesting part of the graph.
- 运行时间 - 删除区的线程执行因其他线程抢占、等待系统资源、锁竞争等原因而被操作系统暂停的时间。仅在剖析器执行上下文切换捕获时可用(第 3.15.3 节)。激活 Self time 时无法选择。- 分区中的最小值 - 排除显示时间范围两端没有足够值的分区。增加该参数将消除异常值,从而使我们能够专注于图表中有趣的部分。
You can drag the left mouse button over the histogram to select a time range that you want to look at closely. This will display the data in the histogram info section, and it will also filter zones shown in the found zones section. This is quite useful if you actually want to look at the outliers, i.e., where did they originate from, what the program was doing at the moment, etc . You can reset the selection range by pressing the right mouse button on the histogram.
您可以在直方图上拖动鼠标左键,选择要仔细查看的时间范围。这将在直方图信息部分显示数据,还将在找到的区域部分显示过滤区域。如果您确实想查看异常值,即异常值来自何处、程序当时在做什么等,这将非常有用 。按下直方图上的鼠标右键可以重置选择范围。
The found zones section displays the individual zones grouped according to the following criteria:
找到的防区 "部分显示根据以下标准分组的各个防区:
  • Thread - In this mode you can see which threads were executing the zone.
    线程 - 在此模式下,您可以查看哪些线程在执行区域。
  • User text - Splits the zones according to the custom user text (see section 3.4).
    用户文本 - 根据自定义用户文本分割区段(参见第 3.4 节)。
  • Zone name - Groups zones by the name set on a per-call basis (see section 3.4).
    区段名称 - 根据按呼叫设置的名称分组区段(见第 3.4 节)。
  • Call stacks - Zones are grouped by the originating call stack (see section 3.11). Note that two call stacks may sometimes appear identical, even if they are not, due to an easily overlooked difference in the source line numbers.
    调用堆栈 - 区域按源代码调用堆栈分组(参见第 3.11 节)。请注意,由于源代码行号的差异,两个调用堆栈有时可能看起来完全相同,但实际上并非如此。
  • Parent - Groups zones according to the parent zone. This mode relies on the zone hierarchy and not on the call stack information.
    父区 - 根据父区对区段进行分组。这种模式依赖于区段层次结构,而不是调用堆栈信息。
  • No grouping - Disables zone grouping. It may be useful when you want to see zones in order as they appear.
    无分组 - 禁用区段分组。当您想按出现的顺序查看区段时,它可能会很有用。
You may sort each group according to the order in which it appeared, the call count, the total time spent in the group, or the mean time per call. Expanding the group view will display individual occurrences of the zone, which can be sorted by application's time, execution time, or zone's name. Clicking the left mouse button on a zone will open the zone information window (section 5.13). Clicking the middle mouse button on a zone will zoom the timeline view to the zone's extent.
您可以根据出现顺序、调用次数、在组中花费的总时间或每次调用的平均时间对每个组进行排序。展开组视图将显示区段的单个出现情况,可按应用程序时间、执行时间或区段名称进行排序。单击区段上的鼠标左键将打开区段信息窗口(第 5.13 节)。单击区域上的鼠标中键,将缩放时间线视图至区域范围。
Clicking the left mouse button on the group name will highlight the group time data on the histogram (figure 22). This function provides a quick insight into the impact of the originating thread or input data on the zone performance. Clicking on the Clear button will reset the group selection. If the grouping mode is set to Parent option, clicking the middle mouse button on the parent zone group will switch the find zone view to display the selected zone.
在组名上单击鼠标左键,直方图上的组时间数据将突出显示(图 22)。通过此功能,可以快速了解起始线程或输入数据对区段性能的影响。单击 清除按钮将重置分组选择。如果分组模式设置为父选项,单击父区段组上的鼠标中键将切换查找区段视图,显示所选区段。
Figure 22: Zone execution time histogram with a group highlighted.
图 22:区域执行时间柱状图,突出显示一个组。
The call stack grouping mode has a different way of listing groups. Here only one group is displayed at any time due to the need to display the call stack frames. You can switch between call stack groups by using the and buttons. You can select the group by clicking on the Select button. You can open the call stack window (section 5.14) by pressing the Call stack button.
调用堆栈分组模式采用不同的分组列表方式。由于需要显示调用堆栈帧,因此在任何时候都只显示一个组。您可以使用 和按钮在呼叫堆栈组之间切换。单击 Select(选择)按钮可以选择组。按下 呼叫堆栈按钮可以打开呼叫堆栈窗口(第 5.14 节)。
Tracy displays a variety of statistical values regarding the selected function: mean (average value), median (middle value), mode (most common value, quantized using histogram bins), and (standard deviation). The mean and median zone times are also displayed on the histogram as red (mean) and blue (median) vertical bars. Additional bars will indicate the mean group time (orange) and median group time (green). You can disable the drawing of either set of markers by clicking on the check-box next to the color legend.
Tracy 显示有关所选功能的各种统计值:平均值(平均值)、中位数(中间值)、模式(最常见值,使用直方图分段量化)和 (标准偏差)。区域时间的平均值和中位数也会以红色(平均值)和蓝色(中位数)垂直条的形式显示在直方图上。其他条形图将显示平均组时间(橙色)和中位数组时间(绿色)。您可以单击颜色图例旁边的复选框来禁用任一组标记的绘制。
Hovering the mouse cursor over a zone on the timeline, which is currently selected in the find zone window, will display a pulsing vertical bar on the histogram, highlighting the bin to which the hovered zone has been assigned. In addition, it will also highlight zone entry on the zone list.
将鼠标光标悬停在时间轴上当前在查找区段窗口中选择的区段上,直方图上将显示一个脉冲垂直条,突出显示悬停的区段所分配的分区。此外,它还会突出显示区段列表中的区段条目。

Keyboard shortcut 键盘快捷键

You may press Ctrl to open or focus the find zone window and set the keyboard input on the search box.
您可以按 Ctrl 键打开或聚焦查找区域窗口,并在搜索框上设置键盘输入。

Caveats 注意事项

When using the execution times histogram, you must know the hardware peculiarities. Read section 2.2.2 for more detail.
使用执行时间直方图时,必须了解硬件的特殊性。详情请阅读第 2.2.2 节。

5.7.1 Timeline interaction
5.7.1 时间轴互动

The profiler will highlight matching zones on the timeline display when the zone statistics are displayed in the find zone menu. Highlight colors match the histogram display. A bright blue highlight indicates that a zone is in the optional selection range, while the yellow highlight is used for the rest of the zones.
在查找区段菜单中显示区段统计数据时,剖析器会在时间轴显示屏上高亮显示匹配的区段。高亮颜色与直方图显示相匹配。明亮的蓝色高亮表示某个区段位于可选的选择范围内,而黄色高亮则用于其他区段。

5.7.2 Frame time graph interaction
5.7.2 框架时间图的交互作用

The frame time graph (section 5.2.2) behavior is altered when a zone is displayed in the find zone window and the Show zone time in frames option is selected. An accumulated zone execution time is shown instead of coloring the frame bars according to the frame time targets.
在查找区段窗口中显示区段并选择 "以帧显示区段时间 "选项时,帧时间图(第 5.2.2 节)的行为会发生变化。此时显示的是累积的区段执行时间,而不是根据区段时间目标给帧条着色。
Each bar is drawn in gray color, with the white part accounting for the zone time. If the execution time is greater than the frame time (this is possible if more than one thread was executing the same zone), the overflow will be displayed using red color.
每个条形图都用灰色绘制,白色部分表示区域时间。如果执行时间大于帧时间(如果不止一个线程在执行同一区段,则有可能出现这种情况),溢出将以红色显示。
Enabling Self time option affects the displayed values, but Running time does not.
启用 "自我计时 "选项会影响显示值,但 "运行时间 "不会。

Caveats 注意事项

The profiler might not calculate the displayed data correctly, and it may not include some zones in the reported times.
剖析器可能无法正确计算显示的数据,也可能在报告的时间中不包括某些区段。

5.7.3 Limiting zone time range
5.7.3 限制区段时间范围

If the Limit range option is selected, the profiler will include only the zones within the specified time range (chapter 5.3) in the data. The inclusion region will be marked with a green striped pattern. Note that a zone must be entirely inside the region to be counted. You can access more options through the Limits button, which will open the time range limits window, described in section 5.23.
如果选择 "限制范围 "选项,剖面仪将仅把指定时间范围内的区域(5.3 章)纳入数据中。包含区域将用绿色条纹标记。请注意,区段必须完全位于该区域内才会被计算在内。您可以通过 限制按钮访问更多选项,该按钮将打开时间范围限制窗口,详见第 5.23 节。

5.7.4 Zone samples 5.7.4 区域样本

If sampling data has been captured (see section 3.15.5), an additional expandable Samples section will be displayed. This section contains only the sample data attributed to the displayed zone. Looking at this list may give you additional insight into what is happening within the zone. Refer to section 5.6 .2 for more information about this view.
如果采集了采样数据(见第 3.15.5 节),则会显示一个额外的可扩展样本部分。该部分仅包含显示区段的样本数据。查看此列表可让您更深入地了解区段内发生的情况。有关此视图的更多信息,请参阅第 5.6.2 节。
You can further narrow down the list of samples by selecting a time range on the histogram or by choosing a group in the Found zones section. However, do note that the random nature of sampling makes it highly unlikely that short-lived zones (i.e., left part of the histogram) will have any sample data collected.
您可以在直方图上选择一个时间范围,或在 "找到的区域 "部分选择一个组,从而进一步缩小样本列表的范围。不过请注意,由于取样的随机性,短时区(即直方图的左侧部分)收集到任何样本数据的可能性很小。

5.8 Compare traces window
5.8 比较轨迹窗口

Comparing the performance impact of the optimization work is not an easy thing to do. Benchmarking is often inconclusive, if even possible, in the case of interactive applications, where the benchmarked function might not have a visible impact on frame render time. Furthermore, doing isolated micro-benchmarks loses the application's execution environment, in which many different parts compete for limited system resources.
比较优化工作对性能的影响并非易事。在交互式应用中,基准测试往往无法得出结论,甚至无法实现,因为在交互式应用中,基准测试功能可能不会对帧渲染时间产生明显影响。此外,进行孤立的微基准测试会破坏应用程序的执行环境,因为在这种环境中,许多不同的部分都在争夺有限的系统资源。
Tracy solves this problem by providing a compare traces functionality, very similar to the find zone window, described in section 5.7. You can compare traces either by zone or frame timing data.
Tracy 通过提供与第 5.7 节中描述的查找区域窗口非常相似的比较轨迹功能解决了这一问题。您可以按区域或帧定时数据比较轨迹。
You would begin your work by recording a reference trace that represents the usual behavior of the program. Then, after the optimization of the code is completed, you record another trace, doing roughly what you did for the reference one. Finally, having the optimized trace open, you select the Open second trace option in the compare traces window and load the reference trace.
在开始工作时,您会记录一个代表程序通常行为的参考跟踪。然后,在代码优化完成后,再录制另一条跟踪,操作方法与录制参考跟踪时大致相同。最后,打开优化后的跟踪,在跟踪比较窗口中选择 打开第二个跟踪选项并加载参考跟踪。
Now things start to get familiar. You search for a zone, similarly like in the find zone window, choose the one you want in the matched source locations drop-down, and then you look at the histogram . This time there are two overlaid graphs, one representing the current trace and the second one representing the external (reference) trace (figure 23). You can easily see how the performance characteristics of the zone were affected by your modifications.
现在,一切开始变得熟悉起来。您可以像在查找区域窗口中一样搜索区域,在匹配源位置下拉菜单中选择您想要的区域,然后查看直方图 。这次有两个重叠图形,一个代表当前轨迹,另一个代表外部(参考)轨迹(图 23)。您可以很容易地看到区段的性能特征是如何受到您的修改影响的。
Figure 23: Compare traces histogram.
图 23:比较轨迹柱状图。
Note that the traces are color and symbol-coded. The current trace is marked by a yellow symbol, and the external one is marked by a red symbol.
请注意,轨迹是用颜色和符号编码的。当前轨迹用黄色 符号标记,外部轨迹用红色 符号标记。
When searching for source locations it's not uncommon to match more than one zone (for example a search for Draw may result in DrawCircle and DrawRectangle matches). Typically you wouldn't want to compare execution profiles of two unrelated functions, which is prevented by the link selection option, which ensures that when you choose a source location in one trace, the corresponding one is also selected in the second trace. Be aware that this may still result in a mismatch, for example, if you have overloaded functions. In such a case, you will need to select the appropriate function in the other trace manually.
在搜索源位置时,匹配多个区域的情况并不少见(例如,搜索 "绘制"(Draw)可能会匹配到 "绘制圆"(DrawCircle)和 "绘制矩形"(DrawRectangle))。通常情况下,您不会希望比较两个不相关函数的执行配置文件,而链接选择选项可以避免这种情况,它可以确保当您在一个跟踪中选择一个源位置时,在第二个跟踪中也会选择相应的源位置。请注意,这仍可能导致不匹配,例如,如果您有重载函数。在这种情况下,您需要在另一条轨迹中手动选择相应的函数。
It may be difficult, if not impossible, to perform identical runs of a program. This means that the number of collected zones may differ in both traces, influencing the displayed results. To fix this problem, enable the Normalize values option, which will adjust the displayed results as if both traces had the same number of recorded zones.
要执行完全相同的程序运行可能很困难,甚至不可能。这意味着两个轨迹中采集的区段数量可能不同,从而影响显示结果。要解决这个问题,可以启用 "正常化值 "选项,它将调整显示结果,使两个轨迹的记录区数量相同。

Trace descriptions 跟踪描述

Set custom trace descriptions (see section 5.12) to easily differentiate the two loaded traces. If no trace description is set, the name of the profiled program will be displayed along with the capture time.
设置自定义跟踪描述(见第 5.12 节),以方便区分两个加载的跟踪。如果未设置跟踪描述,则会显示剖析程序的名称和捕获时间。

5.8.1 Source files diff
5.8.1 源文件差异

To see what changes were made in the source code between the two compared traces, select the Source diff compare mode. This will display a list of deleted, added, and changed files. By default, the difference is calculated from the older trace to the newer one. You can reverse this by clicking on the Switch button.
要查看源代码在两个比较跟踪之间发生了哪些变化,请选择源代码差异比较模式。这将显示删除、添加和更改的文件列表。默认情况下,差异是从较旧跟踪到较新跟踪的计算结果。你可以点击切换按钮来扭转这种情况。
Please note that changes will be registered only if the file has the same name and location in both traces. Tracy does not resolve file renames or moves.
请注意,只有当文件在两个跟踪中的名称和位置相同时,更改才会被登记。Tracy 无法解决文件重命名或移动的问题。

5.9 Memory window 5.9 内存窗口

You can view the data gathered by profiling memory usage (section 3.8) in the memory window. If the profiler tracked more than one memory pool during the capture, you would be able to select which collection you want to look at, using the Memory pool selection box.
您可以在内存窗口中查看通过剖析内存使用情况(第 3.8 节)收集到的数据。如果剖析器在捕获过程中跟踪了多个内存池,则可以使用内存池选择框选择要查看的内存池。
The top row contains statistics, such as total allocations count, number of active allocations, current memory usage and process memory span .
最上面一行包含统计数据,例如分配总数、活动分配数、当前内存使用量和进程内存跨度
The lists of captured memory allocations are displayed in a common multi-column format through the profiler. The first column specifies the memory address of an allocation or an address and an offset if the address is not at the start of the allocation. Clicking the left mouse button on an address will open the memory allocation information window (see section 5.11). Clicking the middle mouse button on an address will zoom the timeline view to memory allocation's range. The next column contains the allocation size.
捕获的内存分配列表通过剖析器以常见的多列格式显示。第一列指定分配的内存地址,如果地址不在分配的起点,则指定地址和偏移量。单击地址上的鼠标左键将打开内存分配信息窗口 (参见第 5.11 节)。单击地址上的鼠标中键,将缩放时间轴视图至内存分配范围。下一列包含分配大小。
The allocation's timing data is contained in two columns: appeared at and duration. Clicking the left mouse button on the first one will center the timeline view at the beginning of allocation, and likewise, clicking on the second one will center the timeline view at the end of allocation. Note that allocations that have not yet been freed will have their duration displayed in green color.
分配的时间数据包含在两列中:出现时间和持续时间。单击第一列上的鼠标左键,时间线视图将居中显示分配的开始时间;同样,单击第二列,时间线视图将居中显示分配的结束时间。请注意,尚未释放的分配将以绿色显示其持续时间。
The memory event location in the code is displayed in the last four columns. The thread column contains the thread where the allocation was made and freed (if applicable), or an alloc / free pair of the threads if it was allocated in one thread and freed in another. The zone alloc contains the zone in which the allocation was performed , or - if there was no active zone in the given thread at the time of allocation. Clicking the left mouse button on the zone name will open the zone information window (section 5.13). Similarly, the zone free column displays the zone which freed the allocation, which may be colored yellow, if it is the same zone that did the allocation. Alternatively, if the zone has not yet been freed, a green active text is displayed. The last column contains the alloc and free call stack buttons, or their placeholders, if no call stack is available (see section 3.11 for more information). Clicking on either of the buttons will open the call stack window (section 5.14). Note that the call stack buttons that match the information window will be highlighted.
代码中的内存事件位置显示在最后四列。线程列包含进行分配和释放的线程(如果适用),如果在一个线程中分配,在另一个线程中释放,则包含线程的分配/释放对。区段分配包含进行分配的区段 ,如果分配时给定线程中没有活动区段,则包含-。单击区段名称上的鼠标左键将打开区段信息窗口(第 5.13 节)。同样,区段空闲列会显示释放分配的区段,如果是进行分配的同一区段,则显示黄色。另外,如果区段尚未释放,则会显示绿色活动文本。最后一列包含分配和释放调用堆栈按钮,如果没有调用堆栈,则显示其占位符(更多信息请参见第 3.11 节)。单击其中一个按钮将打开调用堆栈窗口(第 5.14 节)。请注意,与信息窗口匹配的调用堆栈按钮将突出显示。
The memory window is split into the following sections:
内存窗口分为以下几个部分:

5.9.1 Allocations 5.9.1 拨款

The @ Allocations pane allows you to search for the specified address usage during the whole lifetime of the program. All recorded memory allocations that match the query will be displayed on a list.
通过 @ 分配窗格,可以搜索程序整个生命周期内的指定地址使用情况。所有符合查询条件的内存分配记录都会显示在列表中。

5.9.2 Active allocations
5.9.2 主动分配

The Active allocations pane displays a list of currently active memory allocations and their total memory usage. Here, you can see where your program allocated memory it is now using. If the application has already exited, this becomes a list of leaked memory.
活动分配 "窗格显示当前活动内存分配及其总内存使用量的列表。在这里,你可以看到程序分配的内存现在正在使用的位置。如果程序已经退出,则会显示泄漏内存列表。

5.9.3 Memory map 5.9.3 内存地图

On the Memory map pane, you can see the graphical representation of your program's address space. Active allocations are displayed as green lines, while the freed memory is red. The brightness of the color indicates how much time has passed since the last memory event at the given location - the most recent events are the most vibrant.
在内存地图窗格中,你可以看到程序地址空间的图形显示。活动分配显示为绿线,已释放的内存显示为红线。颜色的亮度表示给定位置距上次内存事件发生的时间--最近的事件最鲜艳。
This view may help assess the general memory behavior of the application or in debugging the problems resulting from address space fragmentation.
该视图有助于评估应用程序的一般内存行为,或调试地址空间碎片导致的问题。

5.9.4 Bottom-up call stack tree
5.9.4 自下而上的调用栈树

The Bottom-up call stack tree pane is only available, if the memory events were collecting the call stack data (section 3.11). In this view, you are presented with a tree of memory allocations, starting at the call stack entry point and going up to the allocation's pinpointed place. Each tree level is sorted according to the number of bytes allocated in the given branch.
自下而上的调用堆栈树窗格只有在内存事件正在收集调用堆栈数据时才可用(第 3.11 节)。在该视图中,您将看到一棵内存分配树,从调用栈入口点开始,一直延伸到分配的精确定位点。每一级树形结构都根据给定分支中分配的字节数进行排序。
Each tree node consists of the function name, the source file location, and the memory allocation data. The memory allocation data is either yellow inclusive events count (allocations performed by children) or the cyan exclusive events count (allocations that took place in the node . Two values are counted: total memory size and number of allocations.
每个树节点由函数名称、源文件位置和内存分配数据组成。内存分配数据要么是黄色的包含事件计数(由子节点执行的分配),要么是青色的不包含事件计数(发生在节点 中的分配)。计算两个值:总内存大小和分配次数。
The Group by function name option controls how tree nodes are grouped. If it is disabled, the grouping is performed at a machine instruction-level granularity. This may result in a very verbose output, but the displayed source locations are precise. To make the tree more readable, you may opt to perform grouping at the function name level, which will result in less valid source file locations, as multiple entries are collapsed into one.
按函数名分组选项控制树节点的分组方式。如果禁用,则按机器指令级粒度进行分组。这可能导致输出非常冗长,但显示的源代码位置是精确的。为了提高树的可读性,你可以选择在函数名级别进行分组,这将导致源文件位置的有效性降低,因为多个条目被折叠成一个。
Enabling the Only active allocations option will limit the call stack tree only to display active allocations. Enabling Only inactive allocations option will have similar effect for inactive allocations. Both are mutually exclusive, enabling one disables the other. Displaing inactive allocations, when combined with Limit range, will show short lived allocatios highlighting potentially unwanted behavior in the code.
启用 "仅活动分配 "选项将限制调用堆栈树只显示活动分配。启用 "仅限非活动分配 "选项也会对非活动分配产生类似效果。两者相互排斥,启用一个会禁用另一个。显示非活动分配与限制范围相结合时,将显示短时分配,突出显示代码中可能不需要的行为。
Clicking the right mouse button on the function name will open the allocations list window (see section 5.10), which lists all the allocations included at the current call stack tree level. Likewise, clicking the right mouse button on the source file location will open the source file view window (if applicable, see section 5.16 ).
在函数名上单击鼠标右键将打开分配列表窗口(见第 5.10 节),其中列出了当前调用堆栈树层所包含的所有分配。同样,在源文件位置上单击鼠标右键将打开源文件视图窗口(如适用,请参见第 5.16 节)。
Some function names may be too long to correctly display, with the events count data at the end. In such cases, you may press the control button, which will display the events count tooltip.
有些函数名称可能太长,无法正确显示,事件计数数据会显示在末尾。在这种情况下,您可以按下控制按钮,它将显示事件计数工具提示。

5.9.5 Top-down call stack tree
5.9.5 自上而下的调用栈树

This pane is identical in functionality to the Bottom-up call stack tree, but the call stack order is reversed when the tree is built. This means that the tree starts at the memory allocation functions and goes down to the call stack entry point.
该窗格与 "自下而上 "调用栈树的功能相同,但在构建调用栈树时,调用栈顺序会颠倒。这意味着该树从内存分配函数开始,一直向下延伸到调用栈入口点。

5.9.6 Looking back at the memory history
5.9.6 回顾记忆历史

By default, the memory window displays the memory data at the current point of program execution. It is, however, possible to view the historical data by enabling the the memory events within the time range in the displayed results. See section 5.23 for more information.
默认情况下,内存窗口显示的是当前程序执行点的内存数据。不过,也可以通过启用 ,在显示结果的时间范围内查看内存事件,从而查看历史数据。更多信息请参见第 5.23 节。

5.10 Allocations list window
5.10 分配列表窗口

This window displays the list of allocations included at the selected call stack tree level (see section 5.9 and 5.9.4).
该窗口显示所选调用堆栈树级别中包含的分配列表(参见第 5.9 节和第 5.9.4 节)。

5.11 Memory allocation information window
5.11 内存分配信息窗口

The information about the selected memory allocation is displayed in this window. It lists the allocation's address and size, along with the time, thread, and zone data of the allocation and free events. Clicking the -2 Zoom to allocation button will zoom the timeline view to the allocation's extent.
所选内存分配的相关信息将显示在此窗口中。它列出了分配的地址和大小,以及分配和释放事件的时间、线程和区域数据。单击 -2 放大到分配按钮可将时间线视图放大到分配的范围。

5.12 Trace information window
5.12 跟踪信息窗口

This window contains information about the current trace: captured program name, time of the capture, profiler version which performed the capture, and a custom trace description, which you can fill in.
该窗口包含有关当前跟踪的信息:捕获的程序名称、捕获时间、执行捕获的剖析器版本,以及您可以填写的自定义跟踪描述。
Open the Trace statistics section to see information about the trace, such as achieved timer resolution, number of captured zones, lock events, plot data points, memory allocations, etc.
打开 "跟踪统计 "部分,查看有关跟踪的信息,如已实现的计时器分辨率、捕获区域数量、锁定事件、绘图数据点、内存分配等。
There's also a section containing the selected frame set timing statistics and histogram . As a convenience, you can switch the active frame set here and limit the displayed frame statistics to the frame range visible on the screen.
还有一个部分包含所选帧集的定时统计和直方图 。为方便起见,您可以在此切换活动帧集,并将显示的帧统计限制在屏幕上可见的帧范围内。
If CPU topology data is available (see section 3.15.4), you will be able to view the package, core, and thread hierarchy.
如果 CPU 拓扑数据可用(参见第 3.15.4 节),则可以查看软件包、内核和线程层次结构。
The Source location substitutions section allows adapting the source file paths, as captured by the profiler, to the actual on-disk locations . You can create a new substitution by clicking the Add new substitution button. This will add a new entry, with input fields for ECMAScript-conforming regular expression pattern and its corresponding replacement string. You can quickly test the outcome of substitutions in the example source location input field, which will be transformed and displayed below, as result.
源位置替换 "部分允许将剖析器捕获的源文件路径调整为实际的磁盘位置 。单击 "添加新替换 "按钮可创建新替换。这将添加一个新条目,输入字段为符合 ECMAScript 的正则表达式模式及其相应的替换字符串。您可以在示例源位置输入字段中快速测试替换结果,结果将在下面转换和显示。

Quick example 快速示例

Let's say we have an Unix-based operating system with program sources in /home/user/program/src/ directory. We have also performed a capture of an application running under Windows, with sources in C: \Users\user\Desktop\program\src directory. The source locations don't match, and the profiler can't access the source files on our disk. We can fix that by adding two substitution patterns:
假设我们有一个基于 Unix 的操作系统,程序源代码位于 /home/user/program/src/ 目录下。我们还捕获了一个在 Windows 下运行的应用程序,其源代码位于 C.Desktop/Program/src/目录下:\Users\user\Desktop\program\src 目录中。源位置不匹配,剖析器无法访问磁盘上的源文件。我们可以通过添加两个替换模式来解决这个问题:
  • ^C:
    Users 用户
    user 用户
    Desktop /home/user
    桌面 /home/user
/
By default, all source file modification times need to be older than the cature time of the trace. This can be disabled using the Enforce source file modification time older than trace capture time check box, i.e. when the source files are under source control and the file modification time is not relevant.
默认情况下,所有源文件的修改时间都必须早于跟踪捕获时间。可以使用 "执行源文件修改时间大于跟踪捕获时间 "复选框禁用此功能,即当源文件处于源控制之下,文件修改时间无关紧要时。
In this window, you can view the information about the machine on which the profiled application was running. This includes the operating system, used compiler, CPU name, total available RAM, etc. In addition, if application information was provided (see section 3.7.1), it will also be displayed here.
在此窗口中,您可以查看运行剖析应用程序的机器的相关信息。其中包括操作系统、使用的编译器、CPU 名称、可用总内存等。此外,如果提供了应用程序信息(参见第 3.7.1 节),也会在此显示。
If an application should crash during profiling (section 2.5), the profiler will display the crash information in this window. It provides you information about the thread that has crashed, the crash reason, and the crash call stack (section 5.14).
如果应用程序在剖析过程中崩溃(第 2.5 节),剖析器将在此窗口中显示崩溃信息。它会提供有关崩溃线程、崩溃原因和崩溃调用堆栈的信息(第 5.14 节)。

5.13 Zone information window
5.13 区域信息窗口

The zone information window displays detailed information about a single zone. There can be only one zone information window open at any time. While the window is open, the profiler will highlight the zone on the timeline view with a green outline. The following data is presented:
区段信息窗口显示单个区段的详细信息。任何时候都只能打开一个区段信息窗口。窗口打开时,剖析器会在时间线视图上用绿色轮廓突出显示区段。将显示以下数据:
  • Basic source location information: function name, source file location, and the thread name.
    基本源文件位置信息:函数名称、源文件位置和线程名称。
  • Timing information. 时间信息。
  • If the profiler performed context switch capture (section 3.15.3) and a thread was suspended during zone execution, a list of wait regions will be displayed, with complete information about the timing, CPU migrations, and wait reasons. If CPU topology data is available (section 3.15.4), the profiler will mark zone migrations across cores with 'C' and migrations across packages - with 'P.' In some cases, context switch data might be incomplete , in which case a warning message will be displayed.
    如果剖析器执行了上下文切换捕获(第 3.15.3 节),且线程在区域执行期间被暂停,则将显示等待区域列表,其中包含有关时序、CPU 迁移和等待原因的完整信息。如果 CPU 拓扑数据可用(第 3.15.4 节),剖析器将用 "C "标记跨内核的区域迁移,用 "P "标记跨包的迁移。
  • Memory events list, both summarized and a list of individual allocation/free events (see section 5.9 for more information on the memory events list).
    内存事件列表,包括汇总事件和单个分配/释放事件列表(有关内存事件列表的更多信息,请参见第 5.9 节)。
  • List of messages that the profiler logged in the zone's scope. If the exclude children option is disabled, messages emitted in child zones will also be included.
    剖析器在区段范围内记录的信息列表。如果禁用了排除子区段选项,子区段中发出的信息也会包括在内。
  • Zone trace, taking into account the zone tree and call stack information (section 3.11), trying to reconstruct a combined zone + call stack trace . Captured zones are displayed as standard text, while not instrumented functions are dimmed. Hovering the mouse pointer over a zone will highlight it on the timeline view with a red outline. Clicking the left mouse button on a zone will switch the zone info window to that zone. Clicking the middle mouse button on a zone will zoom the timeline view to the zone's extent. Clicking the right mouse button on a source file location will open the source file view window (if applicable, see section 5.16).
    区域跟踪,考虑区域树和调用堆栈信息(第 3.11 节),尝试重建区域 + 调用堆栈跟踪 。捕获的区以标准文本显示,而未使用仪器的函数则会变暗。将鼠标指针悬停在区段上,时间线视图上就会以红色轮廓突出显示该区段。在区段上单击鼠标左键将切换区段信息窗口至该区段。单击区域上的鼠标中键,将缩放时间线视图至该区域的范围。在源文件位置上单击鼠标右键将打开源文件视图窗口(如适用,请参阅第 5.16 节)。
  • Child zones list, showing how the current zone's execution time was used. Zones on this list can be grouped according to their source location. Each group can be expanded to show individual entries. All the controls from the zone trace are also available here.
    子区段列表,显示当前区段的执行时间使用情况。该列表中的区段可根据其源位置分组。每个组都可以展开以显示单个条目。区段跟踪中的所有控制功能也可在此使用。
  • Time distribution in child zones, which expands the information provided in the child zones list by processing all zone children (including multiple levels of grandchildren). This results in a statistical list of zones that were really doing the work in the current zone's time span. If a group of zones is selected on this list, the find zone window (section 5.7) will open, with a time range limited to show only the children of the current zone.
    子区段的时间分布,通过处理所有区段的子区段(包括多级子区段)来扩展子区段列表中提供的信息。这将产生一个统计列表,列出在当前区段的时间跨度内真正执行工作的区段。如果在此列表中选择了一组区段,则将打开查找区段窗口(第 5.7 节),时间范围仅限于显示当前区段的子区段。
The zone information window has the following controls available:
区段信息窗口具有以下控制功能:

- Zoom to zone - Zooms the timeline view to the zone's extent.
- 缩放至区域 - 将时间线视图缩放至区域范围。

- Go to parent - Switches the zone information window to display current zone's parent zone (if available).
- 转到上级区段 - 切换区段信息窗口以显示当前区段的上级区段 (如果有)。

- L山l Statistics - Displays the zone general performance characteristics in the find zone window (section 5.7).
- L 山区统计 - 在查找区段窗口中显示区段的一般性能特征 (第 5.7 节)。

- 三Call stack - Views the current zone's call stack in the call stack window (section 5.14). The button will be highlighted if the call stack window shows the zone's call stack. Only available if zone had captured call stack data (section 3.11). - 国 Source - Display source file view window with the zone source code (only available if applicable, see section 5.16). The button will be highlighted if the source file is displayed (but the focused source line might be different).
- 三、调用堆栈 - 在调用堆栈窗口中查看当前区段的调用堆栈 (第 5.14 节)。如果调用堆栈窗口显示了区段的调用堆栈,按钮将突出显示。只有在区段已捕获调用堆栈数据时才可用 (第 3.11 节)。- 国 源 - 显示包含区段源代码的源文件视图窗口 (仅在适用时可用,请参见第 5.16 节)。如果显示源文件,该按钮将突出显示(但聚焦的源代码行可能不同)。

- Go back - Returns to the previously viewed zone. The viewing history is lost when the zone information window is closed or when the type of displayed zone changes (from CPU to GPU or vice versa).
- 返回 - 返回先前查看的区段。关闭区段信息窗口或更改显示的区段类型(从 CPU 到 GPU,反之亦然)时,将丢失查看历史记录。
Clicking on the Copy to clipboard buttons will copy the appropriate data to the clipboard.
单击 "复制到剪贴板 "按钮可将相应数据复制到剪贴板。

5.14 Call stack window
5.14 调用堆栈窗口

This window shows the frames contained in the selected call stack. Each frame is described by a function name, source file location, and originating image name. Function frames originating from the kernel are marked with a red color. Clicking the left mouse button on either the function name of source file location will copy the name to the clipboard. Clicking the right mouse button on the source file location will open the source file view window (if applicable, see section 5.16).
该窗口显示所选调用堆栈中包含的帧。每个帧都有函数名称、源文件位置和源图像 名称。源于内核的函数帧用红色标记。在函数名称或源文件位置上单击鼠标左键,可将名称复制到剪贴板。单击源文件位置上的鼠标右键将打开源文件视图窗口(如适用,请参阅第 5.16 节)。
A single stack frame may have multiple function call places associated with it. This happens in the case of inlined function calls. Such entries will be displayed in the call stack window, with inline in place of frame number .
单个堆栈帧可能关联多个函数调用位置。这种情况发生在内联函数调用中。此类条目将显示在调用堆栈窗口中,用内联代替帧号
Stack frame location may be displayed in the following number of ways, depending on the @ Frame location option selection:
堆叠帧位置可以通过以下几种方式显示,具体取决于 @ 帧位置选项的选择:
  • Source code - displays source file and line number associated with the frame.
    源代码 - 显示与帧相关的源文件和行号。
  • Entry point - source code at the beginning of the function containing selected frame, or function call place in case of inline frames.
    入口点 - 包含所选框架的函数开头的源代码,如果是内联框架,则为函数调用位置。
  • Return address - shows return address, which you may use to pinpoint the exact instruction in the disassembly.
    返回地址 - 显示返回地址,可用于在反汇编中精确定位指令。
  • Symbol address - displays begin address of the function containing the frame address.
    符号地址 - 显示包含帧地址的函数起始地址。
In some cases, it may not be possible to decode stack frame addresses correctly. Such frames will be presented with a dimmed '[ntdll.dll]' name of the image containing the frame address, or simply '[unknown]' if the profiler cannot retrieve even this information. Additionally, '[kernel]' is used to indicate unknown stack frames within the operating system's internal routines.
在某些情况下,可能无法正确解码堆栈帧地址。在这种情况下,堆栈帧将以包含帧地址的映像名称"[ntdll.dll]"来表示,如果分析程序甚至无法检索到该信息,则直接以"[unknown]"来表示。此外,"[kernel]"还用于表示操作系统内部例程中的未知堆栈帧。
If the displayed call stack is a sampled call stack (chapter 3.15.5), an additional button will be available,
如果显示的调用堆栈是采样调用堆栈(3.15.5 章),则可使用附加按钮、
1 Global entry statistics. Clicking it will open the sample entry call stacks window (chapter 5.15) for the current call stack.
1 全局条目统计。点击它将打开当前调用堆栈的示例条目调用堆栈窗口(5.15 章)。
Clicking on the Copy to clipboard button will copy call stack to the clipboard.
单击 "复制到剪贴板 "按钮可将呼叫堆栈复制到剪贴板。

5.14.1 Reading call stacks
5.14.1 读取调用堆栈

You need to take special care when reading call stacks. Contrary to their name, call stacks do not show function call stacks, but rather function return stacks. This might not be very clear at first, but this is how programs do work. Consider the following source code:
读取调用栈时需要特别小心。与它们的名称相反,调用堆栈显示的不是函数调用堆栈,而是函数返回堆栈。这一点起初可能不太清楚,但程序就是这样运行的。请看下面的源代码:
int main()
{
    auto app = std::make_unique<Application>();
    app->Run();
    app.reset();
}
Let's say you are looking at the call stack of some function called within Application::Run. This is the result you might get:
假设您正在查看 Application::Run 中调用的某个函数的调用堆栈。您可能会得到以下结果
0 . . .
  1. ...
  2. Application::Run 应用程序::运行
  3. std::unique_ptr::reset
  4. main 主要
At the first glance it may look like unique_ptr: :reset was the call site of the Application: :Run, which would make no sense, but this is not the case here. When you remember these are the function return points, it becomes much more clear what is happening. As an optimization, Application::Run is returning directly into unique_ptr: :reset, skipping the return to main and an unnecessary reset function call.
乍一看,unique_ptr::reset 是 Application: :Run 的调用点,这没有任何意义,但事实并非如此。如果记住这些是函数的返回点,就会更清楚发生了什么。作为优化,Application::Run 将直接返回 unique_ptr::reset,跳过了返回 main 和不必要的重置函数调用。
Moreover, the linker may determine in some rare cases that any two functions in your program are identical . As a result, only one copy of the binary code will be provided in the executable for both functions to share. While this optimization produces more compact programs, it also means that there's no way to distinguish the two functions apart in the resulting machine code. In effect, some call stacks may look nonsensical until you perform a small investigation.
此外,在某些罕见的情况下,链接器可能会确定程序中的任意两个函数 是相同的。因此,可执行文件中将只提供一份二进制代码供两个函数共享。虽然这种优化会使程序更加紧凑,但也意味着在生成的机器码中无法区分这两个函数。实际上,有些调用堆栈可能看起来毫无道理,直到你做了一个小调查。

5.15 Sample entry call stacks window
5.15 输入调用堆栈窗口示例

This window displays statistical information about the selected symbol. All sampled call stacks (chapter 3.15.5) leading to the symbol are counted and displayed in descending order. You can choose the displayed call stack using the entry call stack controls, which also display time spent in the selected call stack. Alternatively, sample counts may be shown by disabling the Show time option, which is described in more detail in chapter 5.6.2.
该窗口显示所选符号的统计信息。通向该符号的所有采样调用堆栈(3.15.5 章)都会被统计并按降序显示。您可以使用条目调用堆栈控件选择显示的调用堆栈,该控件还显示在所选调用堆栈中花费的时间。另外,也可以通过禁用显示时间选项来显示样本计数,详见 5.6.2 章。
The layout of frame list and the @ Frame location option selection is similar to the call stack window, described in chapter 5.14 .
帧列表的布局和 @ 帧位置选项的选择与 5.14 章所述的调用堆栈窗口类似。

5.16 Source view window
5.16 信号源视图窗口

This window can operate in one of the two modes. The first one is quite simple, just showing the source code associated with a source file. The second one, which is used if symbol context is available, is considerably more feature-rich.
该窗口有两种操作模式。第一种模式非常简单,只显示与源文件相关的源代码。第二种模式在有符号上下文的情况下使用,功能要丰富得多。

5.16.1 Source file view
5.16.1 源文件视图

In source view mode, you can view the source code of the profiled application to take a quick glance at the context of the function behavior you are analyzing. The profiler will highlight the selected line (for example, a location of a profiling zone) both in the source code listing and on the scroll bar. The contents of the file displayed in the source view can be copied to the clipboard using the button adjacent to the file name.
在源代码视图模式下,您可以查看被剖析应用程序的源代码,以便快速浏览正在分析的函数行为的上下文。剖析器会在源代码列表和滚动条上高亮显示所选行(例如,剖析区的位置)。源代码视图中显示的文件内容可通过文件名旁边的按钮复制到剪贴板。
Important 重要
To display source files, Tracy has to gain access to them somehow. Since having the source code is not needed for the profiled application to run, this can be problematic in some cases. The source files search order is as follows:
要显示源文件,Tracy 必须以某种方式访问它们。由于运行剖析应用程序并不需要源代码,这在某些情况下会造成问题。源文件搜索顺序如下
  1. Discovery is performed on the server side. Found files are cached in the trace. This is appropriate when the client and the server run on the same machine or if you're deploying your application to the target device and then run the profiler on the same workstation.
    发现在服务器端执行。发现的文件会缓存在跟踪中。当客户端和服务器在同一台机器上运行,或者将应用程序部署到目标设备上,然后在同一台工作站上运行剖析器时,这种方法是合适的。
  2. If not found, discovery is performed on the client-side. Found files are cached in the trace. This is appropriate when you are developing your code on another machine, for example, you may be working on a dev-board through an SSH connection.
    如果未找到,则在客户端进行查找。找到的文件会缓存在跟踪中。这适用于在另一台机器上开发代码的情况,例如通过 SSH 连接在开发板上工作。
  3. If not found, Tracy will try to open source files that you might have on your disk later on. The profiler won't store these files in the trace. You may provide custom file path substitution rules to redirect this search to the right place (see section 5.12).
    如果找不到,Tracy 会尝试打开您磁盘上的源文件。剖析器不会在跟踪中存储这些文件。您可以提供自定义文件路径替换规则,将搜索重定向到正确的位置(见第 5.12 节)。
Note that the discovery process not only looks for a file on the disk but it also checks its time stamp and validates it against the executable image timestamp or, if it's not available, the time of the performed capture. This will prevent the use of newer source files (i.e., were changed) than the program you're profiling.
请注意,发现过程不仅会在磁盘上查找文件,还会检查其时间戳,并根据可执行映像的时间戳进行验证,如果没有时间戳,则根据执行捕获的时间进行验证。这将防止使用比正在剖析的程序更新(即已更改)的源文件。
Nevertheless, the displayed source files might still not reflect the code that you profiled! It is up to you to verify that you don't have a modified version of the code with regards to the trace.
不过,显示的源文件可能仍然无法反映您所剖析的代码!这就需要由您来验证您的代码版本是否与跟踪的代码版本一致。

5.16.2 Symbol view 5.16.2 符号视图

A much more capable symbol view mode is available if the inspected source location has an associated symbol context (i.e., if it comes from a call stack capture, from call stack sampling, etc.). A symbol is a unit of machine code, basically a callable function. It may be generated using multiple source files and may consist of numerous inlined functions. A list of all captured symbols is available in the statistics window, as described in chapter 5.6.2.
如果被检查的源代码位置有相关的符号上下文(即来自调用堆栈捕获、调用堆栈采样等),则可以使用功能更强的符号视图模式。符号是机器代码的一个单元,基本上是一个可调用的函数。它可能由多个源文件生成,也可能由多个内联函数组成。如 5.6.2 章所述,统计窗口中提供了所有捕获符号的列表。
The header of symbol view window contains a name of the selected symbol, a list of functions that contribute to the symbol, and information such as count of probed Samples.
符号视图窗口的页眉包含所选符号的名称、对符号有贡献的函数列表以及探测样本计数等信息。
Additionally, you may use the Mode selector to decide what content should be displayed in the panels below:
此外,您还可以使用模式选择器来决定在下面的面板中显示哪些内容:
  • Source - only the source code will be displayed.
    源代码 - 只显示源代码。
  • Assembly - only the machine code disassembly will be shown.
    汇编 - 只显示机器代码反汇编。
  • Both - selects combined mode, in which source code and disassembly will be listed next to each other.
    两者 - 选择组合模式,在该模式下,源代码和反汇编将并列在一起。
Some modes may be unavailable in some circumstances (missing or outdated source files, lack of machine code). In case the Assembly mode is unavailable, this might be due to the capstone disassembly engine failing to disassemble the machine instructions. See section 2.3 for more information.
某些模式在某些情况下可能不可用(源文件丢失或过时、缺少机器代码)。如果汇编模式不可用,可能是因为顶石反汇编引擎无法反汇编机器指令。更多信息请参见第 2.3 节。

5.16.2.1 Source mode 5.16.2.1 信号源模式

This is pretty much the source file view window, but with the ability to select one of the source files that the compiler used to build the symbol. Additionally, each source file line that produced machine code in the symbol will show a count of associated assembly instructions, displayed with an '@' prefix, and will be marked with grey color on the scroll bar. Due to how optimizing compilers work, some lines may seemingly not produce any machine code, for example, because iterating a loop counter index might have been reduced to advancing a data pointer. Some other lines may have a disproportionate amount of associated instructions, e.g., when the compiler applied a loop unrolling optimization. This varies from case to case and from compiler to compiler.
这几乎就是源文件视图窗口,但可以选择编译器用于构建符号的源文件之一。此外,符号中产生机器代码的每一行源文件都会显示相关汇编指令的计数,并以"@"为前缀,在滚动条上以灰色标出。由于优化编译器的工作方式,有些行可能看似不产生任何机器代码,例如,因为迭代循环计数器索引可能被简化为推进数据指针。其他一些行可能会有过多的相关指令,例如,当编译器应用了循环解卷优化时。这种情况因情况和编译器而异。
The Propagate inlines option, available when sample data is present, will enable propagation of the instruction costs down the local call stack. For example, suppose a base function in the symbol issues a call to an inlined function (which may not be readily visible due to being contained in another source file). In that case, any cost attributed to the inlined function will be visible in the base function. Because the cost information is added to all the entries in the local call stacks, it is possible to see seemingly nonsense total cost values when this feature is enabled. To quickly toggle this on or off, you may also press the key.
当存在示例数据时,可使用 "传播内联 "选项,将指令成本向下传播到本地调用堆栈。例如,假设符号中的基本函数调用了内联函数(由于内联函数包含在另一个源文件中,因此可能不容易看到)。在这种情况下,任何归因于内联函数的代价都将在基本函数中可见。由于成本信息会被添加到本地调用堆栈的所有条目中,因此启用此功能后有可能看到看似无意义的总成本值。要快速打开或关闭此功能,也可以按下 键。

5.16.2.2 Assembly mode 5.16.2.2 装配模式

This mode shows the disassembly of the symbol machine code. If only one inline function is selected through the Function selector, assembly instructions outside of this function will be dimmed out. Each assembly instruction is displayed listed with its location in the program memory during execution. If the Relative address option is selected, the profiler will print an offset from the symbol beginning instead. Clicking the
该模式显示符号机器码的反汇编。如果通过功能选择器只选择了一个内联功能,则该功能之外的汇编指令将被调暗。在执行过程中,每条汇编指令在程序存储器中的位置都会显示出来。如果选择了 相对地址选项,剖析器将打印从符号开始的偏移量。单击
left mouse button on the address/offset will switch to counting line numbers, using the selected one as the origin (i.e., zero value). Line numbers are displayed inside [] brackets. This display mode can be useful to correlate lines with the output of external tools, such as . To disable line numbering click the
在地址/偏移量上按下鼠标左键,将切换到以所选行为原点(即零值)计算行号。行号显示在[]括号内。这种显示模式有助于将行与外部工具的输出相关联,如 。要禁用行号,请单击
(1.right mouse button on a line number.
(1.在行号上按鼠标右键。
If the Source locations option is selected, each line of the assembly code will also contain information about the originating source file name and line number. Each file is assigned its own color for easier differentiation between different source files. Clicking the left mouse button on a displayed source location will switch the source file, if necessary, and focus the source view on the selected line. Additionally, hovering the mouse cursor over the presented location will show a tooltip containing the name of a function the instruction originates from, along with an appropriate source code fragment and the local call stack if it exists.
如果选择了 "源位置 "选项,程序集代码的每一行还将包含源文件名称和行号信息。每个文件都有自己的颜色,以便于区分不同的源文件。在显示的源代码位置上单击鼠标左键,必要时会切换源文件,并将源代码视图聚焦到所选行上。此外,将鼠标光标悬停在显示的位置上还会显示一个工具提示,其中包含该指令的函数名称,以及相应的源代码片段和本地调用堆栈(如果存在)。

Local call stack 本地调用堆栈

In some cases, it may be challenging to understand what is being displayed in the disassembly. For example, calling the std::lower_bound function may generate multiple levels of inlined functions: first, we enter the search algorithm, then the comparison functions, which in turn may be lambdas that call even more external code, and so on. In such an event, you will most likely see that some external code is taking a long time to execute, and you will be none the wiser on improving things.
在某些情况下,理解反汇编中显示的内容可能具有挑战性。例如,调用 std::lower_bound 函数可能会产生多层内联函数:首先,我们进入搜索算法,然后是比较函数,而比较函数又可能是调用更多外部代码的 lambdas,等等。在这种情况下,你很可能会发现某些外部代码需要很长时间才能执行,而你却没有任何改善的办法。
The local call stack for an assembly instruction represents all the inline function calls within the symbol (hence the 'local' part), which were made to reach the instruction. Deeper inspection of the local call stack, including navigation to the source call site of each participating inline function, can be performed through the context menu accessible by pressing the right mouse button on the source location.
汇编指令的本地调用栈代表了该符号内的所有内联函数调用(因此称为 "本地 "部分),这些调用都是为了执行该指令而进行的。深入查看本地调用堆栈,包括导航到每个参与的内联函数的源调用位置,可通过在源位置上按下鼠标右键访问的上下文菜单进行。
Selecting the Raw code option will enable the display of raw machine code bytes for each line. Individual bytes are displayed with interwoven colors to make reading easier.
选择 "原始码 "选项可显示每一行的原始机器码字节。单个字节以交织的颜色显示,以便于阅读。
If any instruction would jump to a predefined address, the symbolic name of the jump target will be additionally displayed. If the destination location is within the currently displayed symbol, an arrow will be prepended to the name. Hovering the mouse pointer over such symbol name will highlight the target location. Clicking on it with the left mouse button will focus the view on the destination instruction or switch view to the destination symbol.
如果任何指令将跳转到预定义地址,则将额外显示跳转目标的符号名称。如果目标位置位于当前显示的符号内,则会在名称前加上一个 箭头。将鼠标指针悬停在该符号名称上将高亮显示目标位置。用鼠标左键点击目标位置,视图将聚焦到目标指令或切换到目标符号。
Enabling the Jumps option will show jumps within the symbol code as a series of arrows from the jump source to the jump target, and hovering the mouse pointer over a jump arrow will display a jump information tooltip. It will also draw the jump range on the scroll bar as a green line. A horizontal green line will mark the jump target location. Clicking on a jump arrow with the left mouse button will focus the view on the target location. The right mouse button opens a jump context menu, which allows inspection
启用 跳转选项后,符号代码中的跳转将显示为一系列从跳转源到跳转目标的箭头,将鼠标指针悬停在跳转箭头上将显示跳转信息工具提示。将鼠标指针悬停在跳转箭头上还会在滚动条上以绿线显示跳转范围。水平绿线将标记跳跃目标位置。用鼠标左键单击跳转箭头会将视图聚焦到目标位置。鼠标右键可打开跳转上下文菜单,通过该菜单可检查

indicated by a smaller arrow pointing away from the code.
用一个远离代码的小箭头表示。
Portions of the executable used to show the symbol view are stored within the captured profile and don't rely on the available local disk files.
用于显示符号视图的可执行文件部分存储在捕获的配置文件中,不依赖于可用的本地磁盘文件。
Exploring microarchitecture If the listed assembly code targets x 86 or x 64 instruction set architectures, hovering mouse pointer over an instruction will display a tooltip with microarchitectural data, based on measurements made in [AR19]. This information is retrieved from instruction cycle tables and does not represent the true behavior of the profiled code. Reading the cited article will give you a detailed definition of the presented data, but here's a quick (and inaccurate) explanation:
探索微体系结构 如果列出的汇编代码以 x 86 或 x 64 指令集体系结构为目标,将鼠标指针悬停在指令上将显示一个包含微体系结构数据的工具提示,该数据基于 [AR19] 中的测量结果。这些信息是从指令周期表中获取的,并不代表剖析代码的真实行为。阅读引用的文章可以获得所显示数据的详细定义,但这里有一个快速(但不准确)的解释:

- Throughput-How many cycles are required to execute an instruction in a stream of the same independent instructions. For example, if the CPU may execute two independent add instructions simultaneously on different execution units, then the throughput (cycle cost per instruction) is 0.5 .
- 吞吐量--在相同的独立指令流中执行一条指令需要多少个周期。例如,如果 CPU 可以在不同的执行单元上同时执行两条独立的加法指令,那么吞吐量(每条指令的周期成本)就是 0.5。

- Latency - How many cycles it takes for an instruction to finish executing. This is reported as a min-max range, as some output values may be available earlier than the rest.
- 延迟 - 一条指令完成执行需要多少个周期。该值以最小-最大范围报告,因为某些输出值可能早于其他输出值。

- - How many microcode operations have to be dispatched for an instruction to retire. For example, adding a value from memory to a register may consist of two microinstructions: first load the value from memory, then add it to the register.
- - 一条指令需要执行多少次微码操作才能结束。例如,将内存中的值添加到寄存器中可能包括两条微指令:首先从内存中加载值,然后将其添加到寄存器中。

- Ports - Which ports (execution units) are required for dispatch of microinstructions. For example, would mean that out of the three microinstructions implementing the assembly instruction, two can only be executed on port 0 , and one microinstruction can be executed on ports 0,1 , or 5 . The number of available ports and their capabilities varies between different processors architectures. Refer to https://wikichip.org/ for more information.
- 端口 - 调度微指令需要哪些端口(执行单元)。例如, 表示在执行汇编指令的三条微指令中,两条只能在端口 0 上执行,一条微指令可以在端口 0、1 或 5 上执行。不同处理器架构的可用端口数量及其功能各不相同。更多信息请参阅 https://wikichip.org/。
Selection of the CPU microarchitecture can be performed using the is accompanied by the name of an example CPU implementing it. If the current selection matches the microarchitecture on which the profiled application was running, the icon will be green . Otherwise, it
选择 CPU 微体系结构时,可使用"...... "图标,该图标附有实现该体系结构的 CPU 示例名称。如果当前选择与运行剖析应用程序的微体系结构相匹配,图标将显示为绿色 。否则

profiled application was running on.
被剖析应用程序的运行环境。
Clicking on the Save button lets you write the disassembly listing to a file. You can then manually extract some critical loop kernel and pass it to a CPU simulator, such as LLVM Machine Code Analyzer , to see how the code is executed and if there are any pipeline bubbles. Consult the documentation for more details. Alternatively, you might click the right mouse button on a jump arrow and save only the instructions within the jump range, using the Save jump range button.
单击 "保存 "按钮可将反汇编列表写入文件。然后,您可以手动提取一些关键的循环内核,并将其传递给 CPU 模拟器,如 LLVM 机器代码分析器 ,以查看代码是如何执行的,以及是否存在任何流水线气泡。有关详细信息,请查阅 文档。或者,您可以在跳转箭头上单击鼠标右键,然后使用 保存跳转范围按钮,只保存跳转范围内的指令。
Instruction dependencies Assembly instructions may read values stored in registers and may also write values to registers. As a result, a dependency between two instructions is created when one produces some result, which the other then consumes. Combining this dependency graph with information about instruction latencies may give a deep understanding of the bottlenecks in code performance.
指令依赖性 汇编指令可以读取寄存器中存储的值,也可以向寄存器写入值。因此,当一条指令产生某些结果,而另一条指令消耗这些结果时,两条指令之间就产生了依赖关系。将这种依赖关系图与指令延迟信息相结合,可以深入了解代码性能的瓶颈。
Clicking the left mouse button on any assembly instruction will mark it as a target for resolving register dependencies between instructions. To cancel this selection, click on any assembly instruction with ( right mouse button.
在任何汇编指令上单击鼠标左键,都会将其标记为解决指令间寄存器依赖关系的目标。要取消选择,请用鼠标右键单击任何汇编指令。
The selected instruction will be highlighted in white, while its dependencies will be highlighted in red. Additionally, a list of dependent registers will be listed next to each instruction which reads or writes to them, with the following color code:
选中的指令将以白色高亮显示,而其依赖寄存器将以红色高亮显示。此外,在每条读取或写入寄存器的指令旁还将列出依赖寄存器列表,颜色代码如下:

- Green - Register value is read (is a dependency after target instruction).
- 绿色 - 读取寄存器值(是目标指令后的依赖项)。

- Red - A value is written to a register (is a dependency before target instruction).
- 红色 - 一个值被写入寄存器(是目标指令前的依赖项)。

- Yellow - Register is read and then modified.
- 黄色 - 读取并修改寄存器。

- Grey - Value in a register is either discarded (overwritten) or was already consumed by an earlier instruction (i.e., it is readily available ). The profiler will not follow the dependency chain further.
- 灰色 - 寄存器中的值要么已被丢弃(覆盖),要么已被前面的指令消耗(即,它是随时可用的 )。分析器将不再继续跟踪该依赖链。
Search for dependencies follows program control flow, so there may be multiple producers and consumers for any single register. While the after and before guidelines mentioned above hold in the general case, things may be more complicated when there's a large number of conditional jumps in the code. Note that dependencies further away than 64 instructions are not displayed.
依赖关系的搜索遵循程序控制流,因此任何一个寄存器都可能有多个生产者和消费者。虽然上文提到的 after 和 before 准则在一般情况下适用,但当代码中存在大量条件跳转时,情况可能会更加复杂。请注意,超过 64 条指令的依赖关系不会显示。
For more straightforward navigation, dependencies are also marked on the left side of the scroll bar, following the green, red and yellow conventions. The selected instruction is marked in blue.
为了更直观地导航,滚动条左侧还按照绿色、红色和黄色的惯例标注了依赖关系。所选指令用蓝色标出。

5.16.2.3 Combined mode 5.16.2.3 组合模式

In this mode, the source and assembly panes will be displayed together, providing the best way to gain insight into the code. Hovering the mouse pointer over the source file line or the location of the assembly line will highlight the corresponding lines in the second pane (both in the listing and on the scroll bar). Clicking the left mouse button on a line will select it and focus on it in both panes. Note that while an assembly line always has only one corresponding source line, a single source line may have many associated assembly lines, not necessarily next to each other. Clicking on the same source line more than once will focus the assembly view on the next associated instructions block.
在这种模式下,源代码和汇编窗格将同时显示,为深入了解代码提供了最佳途径。将鼠标指针悬停在源文件行或汇编行的位置上,第二窗格中的相应行就会高亮显示(在列表中和滚动条上)。单击某一行上的鼠标左键将选中该行,并在两个窗格中对其进行聚焦。请注意,虽然装配线总是只有一条对应的源线,但一条源线可能有许多相关的装配线,而且不一定紧挨着。多次单击同一源代码线将使装配视图聚焦于下一个相关的指令块。

5.16.2.4 Instruction pointer cost statistics
5.16.2.4 指令指针成本统计

If automated call stack sampling (see chapter 3.15.5) was performed, additional profiling information will be available. The first column of source and assembly views will contain percentage counts of collected instruction pointer samples for each displayed line, both in numerical and graphical bar form. You can use this information to determine which function line takes the most time. The displayed percentage values are heat map color-coded, with the lowest values mapped to dark red and the highest to bright yellow. The color code will appear next to the percentage value and on the scroll bar so that you can identify 'hot' places in the code at a glance.
如果执行了自动调用堆栈采样(参见第 3.15.5 章),则可获得额外的剖析信息。源代码和汇编视图的第一列将包含每个显示行的指令指针采样百分比计数,以数字和图形条的形式显示。您可以利用这些信息来确定哪个函数行花费的时间最多。显示的百分比值采用热图颜色编码,最低值映射为深红色,最高值映射为亮黄色。颜色代码将显示在百分比值旁边和滚动条上,这样您就可以一目了然地识别代码中的 "热点 "位置。
By default, samples are displayed only within the selected symbol, in isolation. In some cases, you may, however, want to include samples from functions that the selected symbol called. To do so, enable the Child calls option, which you may also temporarily toggle by holding the Z key. You can also click the drop down control to display a child call distribution list, which shows each known function that the symbol called. Make sure to familiarize yourself with section 5.14 .1 to be able to read the results correctly.
默认情况下,样本仅在所选符号内单独显示。但在某些情况下,您可能希望包含所选符号调用的函数的样本。为此,请启用 子调用选项,也可以按住 Z 键临时切换该选项。您还可以单击 下拉控件来显示子调用分布列表,其中显示了该符号调用的每个已知函数 。请务必熟悉第 5.14.1 节,以便正确读取结果。
Instruction timings can be viewed as a group. To begin constructing such a group, click the left mouse button on the percentage value. Additional instructions can be added using the Ctrl key while holding the key will allow selection of a range. To cancel the selection, click the right mouse button on a percentage value. Group statistics can be seen at the bottom of the pane.
指令时序可以作为一个组来查看。要开始构建这样一个组,请在百分比值上单击鼠标左键。按住 键可选择一个范围,同时按住 Ctrl 键可添加其他指令。要取消选择,请在百分比值上单击鼠标右键。在窗格底部可以看到组的统计数据。
Clicking the middle mouse button on the percentage value of an assembly instruction will display entry call stacks of the selected sample (see chapter 5.15). This functionality is only available for instructions that have collected sampling data and only in the assembly view, as the source code may be inlined multiple times, which would result in ambiguous location data. Note that number of entry call stacks is displayed in a tooltip for a quick reference.
在汇编指令的百分比值上单击鼠标中键,将显示所选样本的入口调用堆栈(参见第 5.15 章)。该功能仅适用于已收集采样数据的指令,且仅在汇编视图中可用,因为源代码可能被多次内联,从而导致位置数据模糊不清。请注意,入口调用堆栈的数量显示在工具提示中,以便快速参考。
The sample data source is controlled by the Function control in the window header. If this option should be disabled, sample data will represent the whole symbol. If it is enabled, then the sample data will only include the selected function. You can change the currently selected function by opening the drop-down box, which includes time statistics. The time percentage values of each contributing function are calculated relative to the total number of samples collected within the symbol.
样本数据源由窗口标题中的功能控件控制。如果禁用该选项,样本数据将代表整个符号。如果启用,则样本数据将只包括所选函数。您可以通过打开下拉框更改当前选择的函数,下拉框中包含时间统计信息。每个贡献函数的时间百分比值是相对于符号内采集的样本总数计算得出的。
Selecting the Limit range option will restrict counted samples to the time extent shared with the statistics view (displayed as a red-striped region on the timeline). See section 5.3 for more detail.
选择 "限制范围 "选项会将计数样本限制在与统计视图共享的时间范围内(在时间轴上显示为红色条纹区域)。详见第 5.3 节。
Important 重要
Be aware that the data is not entirely accurate, as it results from a random sampling of program execution. Furthermore, undocumented implementation details of an out-of-order CPU architecture will highly impact the measurement. Read chapter 2.2 .2 to see the tip of an iceberg.
请注意,这些数据并不完全准确,因为它们是程序执行过程中随机取样的结果。此外,无序 CPU 架构的未记录实施细节也会对测量结果产生很大影响。请阅读第 2.2.2 章,了解冰山一角。

5.16.2.5 Inspecting hardware samples
5.16.2.5 检查硬件样品

As described in chapter 3.15.6, on some platforms, Tracy can capture the internal statistics counted by the CPU hardware. If this data has been collected, the Cost selection list will be available. It allows changing what is taken into consideration for display by the cost statistics. You can select the following options:
如 3.15.6 章所述,在某些平台上,Tracy 可以捕捉 CPU 硬件计算的内部统计数据。如果已经收集了这些数据,则可以使用成本选择列表。通过该列表可以更改成本统计显示的考虑因素。您可以选择以下选项:
  • Sample count - this selects the instruction pointer statistics, collected by call stack sampling performed by the operating system. This is the default data shown when hardware samples have not been captured.
    采样计数 - 选择由操作系统执行的调用堆栈采样收集的指令指针统计数据。这是未采集硬件采样时显示的默认数据。
  • Cycles - an option very similar to the sample count, but the data is collected directly by the CPU hardware counters. This may make the results more reliable.
    周期 - 与采样计数非常相似,但数据直接由 CPU 硬件计数器收集。这可能会使结果更加可靠。
  • Branch impact - indicates places where many branch instructions are issued, and at the same time,
    分支影响 - 表示同时发出许多分支指令的地方、

the raw branch miss rate, as it considers the number of events taking place.
因为它考虑了发生事件的数量。
  • Cache impact - similar to branch impact, but it shows cache miss data instead. These values are calculated as and will highlight places with lots of cache accesses that also miss.
    缓存影响 - 与分支影响类似,但它显示的是缓存未命中数据。这些值的计算公式为 ,并会突出显示有大量缓存访问的地方也会出现未命中。
  • The rest of the available selections just show raw values gathered from the hardware counters. These are: Retirements, Branches taken, Branch miss, Cache access and Cache miss.
    其余的可用选项只显示从硬件计数器收集到的原始值。它们是退行、分支占用、分支未命中、高速缓存访问和高速缓存未命中。
If the (hardware samples) switch is enabled, the profiler will supplement the cost percentages column with three additional columns. The first added column displays the instructions per cycle (IPC) value. The two remaining columns show branch and cache data, as described below. The displayed values are color-coded, with green indicating good execution performance and red indicating that the code stalled the CPU pipeline for one reason or another.
如果启用 (硬件采样)开关,剖析器将在成本百分比列中增加三列。新增的第一列显示每周期指令 (IPC) 值。其余两列显示分支和高速缓存数据,如下所述。显示的值以颜色编码,绿色表示执行性能良好,红色表示代码因某种原因导致 CPU 流水线停滞。
If the Impact switch is enabled, the branch and cache columns will show how much impact the branch mispredictions and cache misses have. The way these statistics are calculated is described in the list above. In the other case, the columns will show the raw branch and cache miss rate ratios, isolated to their respective source and assembly lines and not relative to the whole symbol.
如果启用了 "影响 "开关,分支和高速缓存列将显示分支预测错误和高速缓存未命中的影响程度。这些统计数据的计算方法已在上面的列表中说明。在其他情况下,各列将显示原始的分支和高速缓存未命中率比率,这些比率与各自的源代码和组装线无关,而与整个符号无关。

Isolated values 隔离值

The percentage values when Impact option is not selected will not take into account the relative count of events. For example, you may see a cache miss rate when some instruction missed 10 out of 10 cache accesses. While not ideal, this is not as important as a seemingly better cache miss rate instruction, which actually has missed 1000 out of 2000 accesses. Therefore, you should always cross-check the presented information with the respective event counts. To help with this, Tracy will dim statistically unimportant values.
未选择 "影响 "选项时,百分比值不会考虑事件的相对计数。例如,当某些指令在 10 次高速缓存访问中错过了 10 次时,您可能会看到 高速缓存未命中率。虽然这并不理想,但与看似更好的 高速缓存未命中率指令(实际上在 2000 次访问中有 1000 次未命中)相比,这并不重要。因此,您应始终将所提供的信息与相应的事件计数进行交叉检查。为了帮助您做到这一点,Tracy 会将统计上不重要的值去掉。

5.17 Wait stacks window
5.17 等待堆栈窗口

If wait stack information has been captured (chapter 3.15.5.1), here you will be able to inspect the collected data. There are three different views available:
如果已捕获等待堆栈信息(3.15.5.1 章),则可在此处检查所收集的数据。有三种不同的视图可供选择:
  • Bottom-up tree - displays wait stacks in the form of a collapsible tree, which starts at the bottom of the call stack.
    自下而上的树形 - 以可折叠的树形显示等待堆栈,从调用堆栈的底部开始。
  • Top-down tree - displays wait stacks in the form of a collapsible tree, which starts at the top of the call stack.
    自上而下的树形 - 以可折叠的树形显示等待堆栈,从调用堆栈的顶部开始。
Displayed data may be narrowed down to a specific time range or to include only selected threads.
显示的数据可缩小到特定的时间范围,或只包括选定的线程。

5.18 Lock information window
5.18 锁定信息窗口

This window presents information and statistics about a lock. The lock events count represents the total number collected of wait, obtain and release events. The announce, termination, and lock lifetime measure the time from the lockable construction until destruction.
该窗口显示有关锁的信息和统计数据。锁事件计数表示收集到的等待、获取和释放事件的总数。公告、终止和锁的生命周期则表示从锁创建到销毁的时间。

5.19 Frame image playback window
5.19 帧图像回放窗口

You may view a live replay of the profiled application screen captures (see section 3.3.3) using this window. Playback is controlled by the Play and Pause buttons and the Frame image slider can be used to scrub to the desired timestamp. Alternatively you may use the and buttons to change single frame back or forward.
您可以使用此窗口查看剖析应用程序屏幕截图的实时回放(参见第 3.3.3 节)。播放由 "播放 "和 "暂停 "按钮控制,"帧图像 "滑块可用于切换到所需的时间戳。此外,您还可以使用 和按钮向前或向后切换单帧。
If the Sync timeline option is selected, the profiler will focus the timeline view on the frame corresponding to the currently displayed screenshot. The Zoom option enlarges the image for easier viewing.
如果选择了同步时间线选项,剖析器将把时间线视图聚焦在与当前显示的屏幕截图相对应的帧上。缩放 选项可放大图像以方便查看。
The following parameters also accompany each displayed frame image: timestamp, showing at which time the image was captured, frame, displaying the numerical value of the corresponding frame, and ratio, telling how well the in-memory loss-less compression was able to reduce the image data size.
每个显示的帧图像还附有以下参数:时间戳,显示图像捕获的时间;帧,显示相应帧的数值;比率,说明内存无损压缩在缩小图像数据大小方面的效果。

5.20 CPU data window
5.20 CPU 数据窗口

Statistical data about all processes running on the system during the capture is available in this window if the profiler performed context switch capture (section 3.15.3).
如果剖析器执行了上下文切换捕获(第 3.15.3 节),则可在此窗口中获得捕获期间系统上运行的所有进程的统计数据。
Each running program has an assigned process identifier (PID), which is displayed in the first column. The profiler will also display a list of thread identifiers (TIDs) if a program entry is expanded.
每个运行中的程序都有一个分配的进程标识符(PID),显示在第一列。如果展开程序条目,剖析器还会显示线程标识符(TID)列表。
The running time column shows how much processor time was used by a process or thread. The percentage may be over , as it is scaled to trace length, and multiple threads belonging to a single program may be executing simultaneously. The running regions column displays how many times a given entry was in the running state, and the CPU migrations shows how many times an entry was moved from one CPU core to another when the system scheduler suspended an entry.
运行时间列显示进程或线程使用了多少处理器时间。百分比可能会超过 ,因为它是根据跟踪长度缩放的,而且属于一个程序的多个线程可能会同时执行。运行区域列显示给定条目处于运行状态的次数,CPU 迁移列显示当系统调度程序暂停条目时,条目从一个 CPU 内核转移到另一个 CPU 内核的次数。
The profiled program is highlighted using green color. Furthermore, the yellow highlight indicates threads known to the profiler (that is, which sent events due to instrumentation).
被剖析的程序用绿色高亮显示。此外,黄色高亮表示剖析器已知的线程(即通过仪器发送事件的线程)。

5.21 Annotation settings window
5.21 注释设置窗口

In this window, you may modify how a timeline annotation (section 5.3.1) is presented by setting its text description or selecting region highlight color. If the note is no longer needed, you may also remove it here.
在此窗口中,您可以通过设置文本描述或选择区域高亮颜色来修改时间线注释(第 5.3.1 节)的显示方式。如果不再需要注释,也可以在此将其删除。

5.22 Annotation list window
5.22 注释列表窗口

This window lists all annotations marked on the timeline. Each annotation is presented, as shown on figure 24 . From left to right the elements are:
该窗口列出时间轴上标记的所有注释。每个注释的显示方式如图 24 所示。从左到右依次为
  • Edit-Opens the annotation settings window (section 5.21).
    编辑-打开注释设置窗口(第 5.21 节)。
  • Zoom - Zooms timeline to the annotation extent.
    缩放 - 将时间线缩放至注释范围。
  • 而 Remove - Removes the annotation. You must press the Ctrl key to enable this button.
    而删除 - 删除注释。必须按下 Ctrl 键才能启用此按钮。
  • Colored box - Color of the annotation.
    彩色框 - 注释的颜色。
  • Text description of the annotation.
    注释的文字说明。

(5) (D) Text description
(5) (D) 文字说明

Figure 24: Annotation list entry
图 24:注释列表条目
A new view-sized annotation can be added in this window by pressing the Add annotation button. This effectively saves your current viewport for further reference.
按下 添加注释按钮,即可在此窗口中添加新的视图大小的注释。这将有效保存当前视口,以供进一步参考。

5.23 Time range limits
5.23 时间范围限制

This window displays information about time range limits (section 5.3) for find zone (section 5.7), statistics (section 5.6), memory (section 5.9) and wait stacks (section 5.17) results. Each limit can be enabled or disabled and adjusted through the following options:
该窗口显示查找区域(第 5.7 节)、统计数据(第 5.6 节)、内存(第 5.9 节)和等待堆栈(第 5.17 节)结果的时间范围限制(第 5.3 节)信息。每个限制都可以启用或禁用,并通过以下选项进行调整:
  • Limit to view - Set the time range limit to current view.
    限制到视图 - 设置当前视图的时间范围限制。
  • Focus - Set the timeline view to the time range extent.
    聚焦 - 将时间线视图设置为时间范围。
  • Set from annotation - Allows using the annotation region for limiting purposes.
    从注释中设置 - 允许使用注释区域进行限制。
  • Copy from statistics - Copies the statistics time range limit.
    从统计数据复制 - 复制统计数据的时间范围限制。
  • Q Copy from find zone - Copies the find zone time range limit.
    Q 从查找区域复制 - 复制查找区域的时间范围限制。
  • Copy from wait stacks - Copies the wait stacks time range limit.
    从等待堆栈复制 - 复制等待堆栈的时间范围限制。
  • .
Note that ranges displayed in the window have color hints that match the color of the striped regions on the timeline.
请注意,窗口中显示的范围有颜色提示,与时间线上条纹区域的颜色一致。

6 Exporting zone statistics to CSV
6 将区统计数据导出为 CSV

You can use a command-line utility in the csvexport directory to export primary zone statistics from a saved trace into a CSV format. The tool requires a single .tracy file as an argument and prints the result into the standard output (stdout), from where you can redirect it into a file or use it as an input into another tool. By default, the utility will list all zones with the following columns:
您可以使用 csvexport 目录中的命令行工具,将已保存跟踪的主区统计数据导出为 CSV 格式。该工具要求将单个 .tracy 文件作为参数,并将结果打印到标准输出 (stdout),您可以将其重定向到文件中,或作为其他工具的输入。默认情况下,该工具将列出所有区段,并包含以下列:
  • name - Zone name
    name - 区域名称
  • src_file - Source file where the zone was set
    src_file - 设置区域的源文件
  • src_line-Line in the source file where the zone was set
    src_line-源文件中设置区域的一行
  • total_ns - Total zone time in nanoseconds
    total_ns - 以纳秒为单位的总区域时间
  • total_perc - Total zone time as a percentage of the program's execution time
    total_perc - 总区域时间占程序执行时间的百分比
  • counts - Zone count
    计数 - 区计数
  • mean_ns - Mean zone time (equivalent to MPTC in the profiler GUI) in nanoseconds
    mean_ns - 平均区域时间(相当于剖析器图形用户界面中的 MPTC),单位为纳秒
  • min_ns - Minimum zone time in nanoseconds
    min_ns - 以纳秒为单位的最小区域时间
  • max_ns - Maximum zone time in nanoseconds
    max_ns - 以纳秒为单位的最大区域时间
  • std_ns - Standard deviation of the zone time in nanoseconds
    std_ns - 以纳秒为单位的区域时间标准偏差
You can customize the output with the following command line options:
您可以使用以下命令行选项自定义输出:
  • -h, --help - Display a help message
    -h、--help - 显示帮助信息
  • -f, --filter - Filter the zone names
    -f, --filter - 筛选区域名称
  • -c, --case - Make the name filtering case sensitive
    -c、--case - 设置名称筛选区分大小写
  • -s, --sep -Customize the CSV separator (default is ",")
    -s、--sep -自定义 CSV 分隔符(默认为",")。
  • -e, --self - Use self time (equivalent to the "Self time" toggle in the profiler GUI)
    -e,--self - 使用自身时间(相当于剖析器图形用户界面中的 "自身时间 "切换键)
  • -u, --unwrap - Report each zone individually; this will discard the statistics columns and instead report the timestamp and duration for each zone entry
    -u,--unwrap - 单独报告每个区段;这将放弃统计列,转而报告每个区段条目的时间戳和持续时间

7 Importing external profiling data
7 导入外部剖析数据

Tracy can import data generated by other profilers. This external data cannot be directly loaded but must be converted first. Currently, there's support for the following formats:
Tracy 可以导入其他分析仪生成的数据。这些外部数据不能直接加载,必须先进行转换。目前支持以下格式:
  • chrome:tracing data through the import-chrome utility. The trace files typically have a json or .json.zst extension. To use this tool to process a file named mytracefile.json, assuming it's compiled, run:
    chrome:跟踪数据。跟踪文件的扩展名通常为 json 或 .json.zst。要使用该工具处理名为 mytracefile.json 的文件(假设已编译),请运行
$ import-chrome mytracefile.json mytracefile.tracy
$ tracy mytracefile.tracy
  • Fuchsia's tracing format data through the import-fuchsia utility. This format has many commonalities with the chrome:tracing format, but it uses a compact and efficient binary encoding that can help lower tracing overhead. The file extension is .fxt or .fxt.zst.
    Fuchsia 的跟踪格式 数据。这种格式与 chrome:tracing 格式有许多共同之处,但它使用了紧凑高效的二进制编码,有助于降低追踪开销。文件扩展名为 .fxt 或 .fxt.zst。
To this this tool, assuming it's compiled, run:
假设该工具已编译完成,请运行
$ import-fuchsia mytracefile.fxt mytracefile.tracy
$ tracy mytracefile.tracy

Compressed traces 压缩痕迹

Tracy can import traces compressed with the Zstandard algorithm (for example, using the zstd command-line utility). Traces ending with .zst extension are assumed to be compressed. This applies for both chrome and fuchsia traces.
Tracy 可以导入使用 Zstandard 算法压缩的轨迹(例如,使用 zstd 命令行工具)。以 .zst 扩展名结尾的痕迹被认为是压缩痕迹。这适用于 Chrome 和 fuchsia 曲线。

Source locations 来源地点

Chrome tracing format doesn't document a way to provide source location data. The import-chrome and import-fuchsia utilities will however recognize a custom loc tag in the root of zone begin events. You should be formatting this data in the usual filename:line style, for example: hello.c:42. Providing the line number (including a colon) is optional but highly recommended.
Chrome 浏览器的跟踪格式中没有提供源位置数据的方法。不过,import-chrome 和 import-fuchsia 工具可以识别区域开始事件根中的自定义 loc 标记。您应该以通常的文件名:行样式来格式化这些数据,例如:hello.c:42。行号(包括冒号)可以不提供,但强烈建议提供。

Limitations 局限性

  • Tracy is a single-process profiler. Should the imported trace contain PID entries, each PID+TID pair will create a new pseudo-TID number, which the profiler will then decode into a PID+TID pair in thread labels. If you want to preserve the original TID numbers, your traces should omit PID entries.
    Tracy 是单进程剖析器。如果导入的跟踪包含 PID 条目,则每个 PID+TID 对都会创建一个新的伪 TID 号,然后剖析器会将其解码为线程标签中的 PID+TID 对。如果希望保留原始 TID 编号,则跟踪应省略 PID 条目。
  • The imported data may be severely limited, either by not mapping directly to the data structures used by Tracy or by following undocumented practices.
    导入的数据可能会受到严重限制,要么没有直接映射到 Tracy 使用的数据结构,要么采用了未注明的做法。

8 Configuration files 8 配置文件

While the client part doesn't read or write anything to the disk (except for accessing the /proc filesystem on Linux), the server part has to keep some persistent state. The naming conventions or internal data format of the files are not meant to be known by profiler users, but you may want to do a backup of the configuration or move it to another machine.
客户端部分不会向磁盘读写任何内容(除了访问 Linux 上的 /proc 文件系统),而服务器部分则必须保持一些持久状态。这些文件的命名规则或内部数据格式并不打算让 profiler 用户知道,但您可能想对配置进行备份或将其转移到另一台机器上。
On Windows settings are stored in the %APPDATA%/tracy directory. All other platforms use the $XDG_CONFIG_HOME/tracy directory, or $HOME/.config/tracy if the XDG_CONFIG_HOME environment variable is not set.
在 Windows 中,设置存储在 %APPDATA%/tracy 目录中。所有其他平台都使用 $XDG_CONFIG_HOME/tracy 目录,如果未设置 XDG_CONFIG_HOME 环境变量,则使用 $HOME/.config/tracy 目录。

8.1 Root directory 8.1 根目录

Various files at the root configuration directory store common profiler state such as UI windows position, connections history, etc.
配置根目录下的各种文件存储了用户界面窗口位置、连接历史等常用剖析器状态。

8.2 Trace specific settings
8.2 特定跟踪设置

Trace files saved on disk are immutable and can't be changed. Still, it may be desirable to store additional per-trace information to be used by the profiler, for example, a custom description of the trace or the timeline view position used in the previous profiling session.
保存在磁盘上的跟踪文件不可更改。不过,可能还需要存储剖析器使用的每条跟踪的附加信息,例如跟踪的自定义描述或上一次剖析会话中使用的时间线视图位置。
This external data is stored in the user/[letter] / [program] / [week] / [epoch] directory, relative to the configuration's root directory. The program part is the name of the profiled application (for example program.exe). The letter part is the first letter of the profiled application's name. The week part is a count of weeks since the Unix epoch, and the epoch part is a count of seconds since the Unix epoch. This rather unusual convention prevents the creation of directories with hundreds of entries.
这些外部数据存储在用户/[字母]/[程序]/[周]/[时间]目录中,与配置根目录相对应。程序部分是被剖析应用程序的名称(例如 program.exe)。字母部分是剖析应用程序名称的第一个字母。星期部分是自 Unix 纪元以来的周计数,而纪元部分是自 Unix 纪元以来的秒计数。这种不同寻常的约定可以防止创建包含数百个条目的目录。
The profiler never prunes user settings.
剖析器从不删除用户设置。

Appendices 附录

A License 许可证

Tracy Profiler (https://github.com/wolfpld/tracy) is licensed under the 3-clause BSD license.
Tracy Profiler ( https://github.com/wolfpld/tracy) 采用 3 条款 BSD 许可。
Copyright (c) 2017-2024, Bartosz Taudul wolf@nereid.pl
All rights reserved. 保留所有权利。
Redistribution and use in source and binary forms, with or without
以源代码和二进制形式再分发和使用,无论是否带
modification, are permitted provided that the following conditions are met:
在满足以下条件的情况下,允许进行修改:
  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
    重新发布源代码必须保留上述版权声明、本条件清单和以下免责声明。
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
    二进制形式的再发行必须在随发行提供的文档和/或其他材料中复制上述版权声明、本条件清单和以下免责声明。
  • Neither the name of the nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
    未经事先书面许可,不得使用本软件的名称或其贡献者的姓名为本软件衍生的产品背书或促销。
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
本软件由版权所有者和贡献者 "按原样 "提供,不提供任何明示或暗示的保证,包括但不限于适销性和特定用途适用性的暗示保证。对于因使用本软件而以任何方式造成的任何直接、间接、附带、特殊、惩戒性或后果性损害(包括但不限于采购替代商品或服务;使用、数据或利润损失;或业务中断),无论其原因如何,也无论其责任理论如何,无论是合同责任、严格责任还是侵权责任(包括疏忽或其他),即使已被告知发生此类损害的可能性,本公司在任何情况下均不承担任何责任。

B Inventory of external libraries
B 外部图书馆清单

The following libraries are included with and used by the Tracy Profiler. Entries marked with a icon are used in the client code.
Tracy Profiler 包含并使用以下库。标有图标的项目在客户端代码中使用。

  1. Direct support is provided for , and Lua integration. At the same time, third-party bindings to many other languages exist on the internet, such as Rust, Zig, C#, OCaml, Odin, etc.
    直接支持 和 Lua 集成。同时,互联网上还存在许多其他语言的第三方绑定,如 Rust、Zig、C#、OCaml、Odin 等。
    All major graphic APIs: OpenGL, Vulkan, Direct3D 11/12, OpenCL.
    所有主要图形 API:OpenGL、Vulkan、Direct3D 11/12、OpenCL。
    See section 1.7 for a benchmark.
    有关基准,请参见第 1.7 节。
  2. In both 32 and 64 bit variants. On x86, Tracy requires a modern version of the rdtsc instruction (Sandy Bridge and later). Note that Time Stamp Counter readings' resolution may depend on the used hardware and its design decisions related to how TSC synchronization is handled between different CPU sockets, etc. On ARM-based systems Tracy will try to use the timer register ( 40 ns resolution). If it fails (due to kernel configuration), Tracy falls back to system provided timer, which can range in resolution from 250 ns to
    有 32 位和 64 位两种版本。在 x86 平台上,Tracy 需要使用现代版本的 rdtsc 指令(Sandy Bridge 及更高版本)。请注意,时间戳计数器读数的分辨率可能取决于所使用的硬件及其与如何在不同 CPU 插座之间处理 TSC 同步等相关的设计决策。在基于 ARM 的系统上,Tracy 会尝试使用定时器寄存器(40 ns 分辨率)。如果失败(由于内核配置的原因),Tracy 将退回到系统提供的定时器,其分辨率从 250 ns 到 不等。
    Interestingly the std::chrono: :high_resolution_clock is not really a high-resolution clock.
    有趣的是,std::chrono: :high_resolution_clock 并不是真正的高分辨率时钟。
    This is a real optimization case. The values are median function run times and do not reflect the real execution time, which explains the discrepancy in the total reported time.
    这是一个真实的优化案例。这些值是函数运行时间的中位数,并不反映实际执行时间,这也是总报告时间出现差异的原因。
    frame is used to describe a single image displayed on the screen by the game (or any other program), preferably 60 times per second to achieve smooth animation. You can also think about physics update frames, audio processing frames, etc.
    帧用于描述游戏(或任何其他程序)在屏幕上显示的单个图像,最好每秒显示 60 次,以实现流畅的动画效果。您还可以考虑物理更新帧、音频处理帧等。
    Frame usage is not required. See section 3.3 for more information.
    帧的使用不是必需的。更多信息请参见第 3.3 节。
  3. See section 2.3 .3 for guidelines.
    有关指导原则,请参见第 2.3.3 节。
  4. This memory is never released, but the profiler reuses it for collection of other events.
    这段内存从未释放,但剖析器会重复使用它来收集其他事件。
    Additional configuration may be required to achieve full functionality, depending on your network layout. Read about UDP broadcasts for more information
    要实现全部功能,可能需要进行额外配置,具体取决于您的网络布局。有关 UDP 广播的更多信息,请阅读
    You may also look at the library directory in the profiler source tree.
    您还可以查看 profiler 源代码树中的库目录。
  5. For example, other programs may already be using it, or you may have overzealous firewall rules, or you may want to run two clients on the same IP address
    例如,其他程序可能已经在使用它,或者您的防火墙规则过于苛刻,或者您可能希望在同一 IP 地址上运行两个客户端。
    A source location is a place in the code, which is identified by source file name and line number, for example, when you markup a zone.
    源位置是代码中的一个位置,通过源文件名和行号来标识,例如,在标记区域时。
  6. Except low-cost ARM CPUs.
    低成本 ARM CPU 除外。
    And by saying 'reliable,' you do in reality mean: behaving in a way you expect it.
    你说的 "可靠",实际上是指:按照你期望的方式行事。
  7. Not necessarily when the application is started, but also when, for example, a blocking mutex becomes released by other thread and is acquired.
    不一定是在应用程序启动时,也可能是在其他线程释放阻塞互斥并获取互斥时。
    AMD processors are not affected by this issue.
    AMD 处理器不受此问题的影响。
  8. Technically, this is not a Tracy dependency, but rather a libstdc++ dependency, but it may still not be installed by default.
    严格来说,这不是 Tracy 依赖项,而是 libstdc++ 依赖项,但默认情况下可能仍未安装。
  9. To get the Intellisense experience if you are using the MSVC compiler, you need to do some additional setup. First, you need to install Ninja (https://ninja-build.org/). The Meson installer (https://github.com/mesonbuild/meson/releases) is probably the most convenient way to do this. Then you need to set the cmake.generator option in the VS Code settings to Ninja. Once this is done, all you have to do is wipe the existing build directories and run the CMake configuration again.
    如果您使用的是 MSVC 编译器,要获得 Intellisense 体验,您需要进行一些额外的设置。首先,您需要安装 Ninja ( https://ninja-build.org/)。Meson 安装程序 ( https://github.com/mesonbuild/meson/releases) 可能是最方便的安装方法。然后,需要将 VS Code 设置中的 cmake.generator 选项设置为 Ninja。设置完成后,你所要做的就是清除现有的构建目录,然后再次运行 CMake 配置。
  10. By default the macros unwrap to __FUNCTION __, FILE and LINE respectively.
    默认情况下,宏分别解包为 __FUNCTION __、FILE 和 LINE。
    You should add either public or public/tracy directory from the Tracy root to the include directories list in your project. Then you will be able to #include "tracy/Tracy.hpp" or #include "Tracy.hpp", respectively.
    您应将 Tracy 根目录下的 public 或 public/tracy 目录添加到项目的包含目录列表中。然后,您就可以分别 #include "tracy/Tracy.hpp" 或 #include "Tracy.hpp"。
  11. For example, invalid memory accesses ('segmentation faults', 'null pointer exceptions'), divisions by zero, etc.
    例如,无效内存访问("分段故障"、"空指针异常")、除以零等。
    With some small exceptions, see section 3.15 .
    除个别例外情况,请参见第 3.15 节。
    You should add either public or public/tracy directory from the Tracy root to the include directories list in your project. Then you will be able to #include "tracy/Tracy.hpp" or #include "Tracy.hpp", respectively.
    您应将 Tracy 根目录下的 public 或 public/tracy 目录添加到项目的包含目录列表中。然后,您就可以分别 #include "tracy/Tracy.hpp" 或 #include "Tracy.hpp"。
  12. If you really do must unload a module, manually allocating a char buffer, as described in section 3.1.2, will give you a persistent string in memory.
    如果您确实必须卸载模块,那么按照第 3.1.2 节中的说明手动分配一个字符缓冲区,就可以在内存中获得一个持久的字符串。
  13. [ISO12] §2.14.5.12: "Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined."
    [ISO12] §2.14.5.12:"是否所有字符串字面量都是不同的(即存储在不重叠的对象中)由实现定义"。
  14. Each frame starts immediately after previous has ended.
    每一帧在前一帧结束后立即开始。
    Alpha value is ignored, but leaving it out wouldn't map well to the way graphics hardware works.
    Alpha 值会被忽略,但如果不考虑它,就无法很好地映射图形硬件的工作方式。
    For example, OpenGL flips images, but Vulkan does not.
    例如,OpenGL 会翻转图像,但 Vulkan 不会。
    One uncompressed 1080p image takes 8 MB.
    一张未压缩的 1080p 图像需要 8 MB。
    One pixel is stored in a nibble ( 4 bits) instead of 32 bits.
    一个像素用一个位点(4 位)存储,而不是 32 位。
  15. Yes, before. We are handling past screen captures here.
    是的,以前。我们正在处理过去的屏幕截图。
  16. A zone represents the lifetime of a special on-stack profiler variable. Typically it would exist for the duration of a whole scope of the profiled function, but you also can measure time spent in scopes of a for-loop or an if-branch.
    区域表示特殊堆栈上剖析器变量的生命周期。通常,它的存在时间是被剖析函数的整个作用域的持续时间,但您也可以测量在 for 循环或 if 分支的作用域中花费的时间。
    The last parameter is explained in section 3.4.3.
    最后一个参数的说明见第 3.4.3 节。
  17. Since std: : shared_mutex was added in C++17, using std: : shared_timed_mutex is the only way to have shared mutex functionality in .
    由于 std: : shared_mutex 是在 C++17 中添加的,因此使用 std: : shared_timed_mutex 是在 中使用共享互斥功能的唯一方法。
  18. It is considerably faster than the OpenGL's TracyGpuCollect.
    它比 OpenGL 的 TracyGpuCollect 快得多。
  19. And possibly other systems, if they decide to adapt the required tooling.
    如果他们决定改装所需的工具,还可能改装其他系统。
  20. While technically this name doesn't need to be constant, like in the ZoneScopedN macro, it should be, as it is used to group the zones This grouping is then used to display various statistics in the profiler. You may still set the per-call name using the tracy. ZoneName method.
    虽然从技术上讲,这个名称并不需要像 ZoneScopedN 宏那样是常量,但它应该是常量,因为它用于对区段进行分组。您仍可使用 tracy.ZoneName 方法设置每次调用的名称。ZoneName 方法设置每个呼叫的名称。
  21. and Clang provide attribute((cleanup)) which can used to run a function when a variable goes out of scope.
    和 Clang 提供了 attribute((cleanup)),用于在变量超出作用域时运行函数。
  22. It's not uncommon to see a pattern where a system function returns some allocated memory, which you then need to release.
    系统函数返回一些已分配的内存,然后您需要释放这些内存,这种模式并不少见。
  23. To make this easier, you can run MSVC with admin privileges, which will be inherited by your program when you start it from within the IDE.
    为了简化操作,您可以使用管理员权限运行 MSVC,当您从集成开发环境中启动程序时,该权限将被程序继承。
  24. context switch happens when any given CPU core stops executing one thread and starts running another one.
    当任何给定的 CPU 内核停止执行一个线程并开始运行另一个线程时,就会发生上下文切换。
    Commonly known as Hyper-threading.
    通常称为超线程。
  25. The maximum sampling frequency is limited by the kernel.perf_event_max_sample_rate sysctl parameter.
    最大采样频率受 kernel.perf_event_max_sample_rate sysctl 参数限制。
  26. The hardware counters in practice can be triggered only once per million-or-so events happening.
    实际上,硬件计数器每发生一百万次左右的事件才能触发一次。
  27. You may need Windows 11 and the WSL preview from Microsoft Store for this to work.
    您可能需要 Windows 11 和 Microsoft 应用商店中的 WSL 预览版才能使用此功能。
  28. Let's say around 256 KB sounds reasonable
    假设 256 KB 左右听起来比较合理
  29. Note that a custom port may be provided here, for example by entering '127.0.0.1:1234',
    请注意,此处可提供自定义端口,例如输入 "127.0.0.1:1234"、
    Only on IPv4 network and only within the broadcast domain.
    仅在 IPv4 网络上,且仅在广播域内。
    Either as an IP address or as a hostname, if able to resolve.
    IP 地址或主机名(如果能够解析)。
  30. You should take this literally. If a live capture is in progress and a save is performed, some data may be missing from the capture and won't be saved.
    您应按字面意思理解。如果正在进行实时捕获并执行保存,捕获中可能会丢失一些数据,因此不会保存。
    While requesting disconnect stops retrieval of any new events, the profiler will wait for any data that is still pending for the current set of events.
    在请求断开连接以停止检索任何新事件的同时,剖析器将等待当前事件集仍在等待的任何数据。
  31. The operating system can manage memory paging much better than Tracy would be ever able to.
    操作系统可以比 Tracy 更好地管理内存分页。
  32. Or perform any action on the timeline view, apart from changing the zoom level.
    或在时间线视图上执行任何操作,除了更改缩放级别。
  33. Visible only if frame instrumentation was included in the capture.
    仅当捕获中包含帧仪器时才可见。
    See section 5.2.3.2 for another way to change the active frame set.
    有关更改活动帧集的另一种方法,请参见第 5.2.3.2 节。
  34. Unless the view is zoomed out and multiple frames are merged into one column.
    除非视图被放大,并且多个帧合并为一列。
  35. By clicking on a thread name, you can temporarily disable the display of the zones in this thread.
    点击主题名称,可以暂时禁用该主题中的区域显示。
  36. This region type is disabled by default and needs to be enabled in options (section 5.4).
    此区域类型默认为禁用,需要在选项中启用(第 5.4 节)。
  37. There is an assumption that drift is linear. Automated measurement calculates and removes change over time in delay-to-execution of GPU zones. Resulting value may still be incorrect.
    假设漂移是线性的。自动测量计算并消除 GPU 区域执行延迟随时间的变化。结果值可能仍然不正确。
    The normalization process removes the function const qualifier, some common return type declarations and all function parameters and template arguments.
    规范化过程会删除函数 const 限定符、一些常见的返回类型声明以及所有函数参数和模板参数。
  38. Note that if inclusive times are displayed, listed functions will be partially or completely coming from mid-stack frames, preventing, or limiting the capability to display parent call stacks.
    请注意,如果显示包含时间,列出的函数将部分或全部来自栈中帧,从而阻止或限制了显示父调用栈的功能。
  39. Symbols larger than 128 KB are not captured.
    不捕获大于 128 KB 的符号。
  40. time, if the cumulate time option is enabled.
    时间,如果启用了累计时间选项。
  41. More often than not you will find out, that the application was just starting, or access to a cold file was required and there's not much you can do to optimize that particular case.
    通常情况下,您会发现应用程序刚刚启动,或者需要访问一个冷文件,在这种特殊情况下,您能做的优化并不多。
  42. When comparing frame times you are presented with a list of available frame sets, without the search box.
    比较帧时间时,您将看到一个可用帧集列表,但没有搜索框。
  43. Memory span describes the address space consumed by the program. It is calculated as a difference between the maximum and minimum observed in-use memory address.
    内存跨度描述了程序消耗的地址空间。其计算方法是观察到的最大和最小在用内存地址之差。
    While the allocation information window is opened, the address will be highlighted on the list.
    分配信息窗口打开时,地址将在列表中突出显示。
    The actual allocation is typically a couple functions deeper in the call stack.
    实际分配通常在调用堆栈的更深处。
  44. Due to the way call stacks work, there is no possibility for an entry to have both inclusive and exclusive counts, in an adequately instrumented program.
    由于调用堆栈的工作方式,在一个经过充分检测的程序中,一个条目不可能同时具有包容性和排他性计数。
  45. See section 5.7 for a description of the histogram. Note that there are subtle differences in the available functionality.
    有关直方图的描述,请参见第 5.7 节。请注意,可用功能之间存在细微差别。
    This does not affect source files cached during the profiling run.
    这不会影响剖析运行期间缓存的源文件。
  46. For example, when capture is ongoing and context switch information has not yet been received.
    例如,当捕获正在进行且尚未收到上下文切换信息时。
    Reconstruction is only possible if all zones have complete call stack capture data available. In the case where that's not available, an unknown frames entry will be present.
    只有当所有区都有完整的调用堆栈捕获数据时,才有可能进行重建。如果没有,则会出现未知帧条目。
  47. Executable images are called modules by Microsoft.
    微软将可执行映像称为模块。
    icon in case of call stack tooltips.
    图标,用于调用堆栈工具提示。
  48. For example, if all they do is zero-initialize a region of memory. As some constructors would do.
    例如,如果它们所做的只是将内存区域初始化为零。就像某些构造函数所做的那样。
  49. This includes jumps, procedure calls, and returns. For example, in x86 assembly the respective operand names can be: , call, ret.
    这包括跳转、过程调用和返回。例如,在 x86 汇编中,相应的操作数名称可以是 、调用、返回。
  50. Comparing sampled instruction counts with microarchitectural details only makes sense when this selection is properly matched.
    将采样指令计数与微体系结构细节进行比较,只有在这种选择适当匹配时才有意义。
    You can use this to gain insight into how the code may behave on other processors.
    您可以利用它来了解代码在其他处理器上的运行情况。
    This is actually a bit of simplification. Run a pipeline simulator, e.g., for a better analysis.
    这实际上是一种简化。运行管道模拟器,例如 ,可以获得更好的分析结果。
  51. You should remember that these are results of random sampling. Some function calls may be missing here.
    请记住,这些都是随机抽样的结果。此处可能缺少某些函数调用。