NVIDIA Collective Communication Library (NCCL) Documentation¶
NVIDIA Collective Communication Library (NCCL) 文档 ¶
Contents: 内容:
- Overview of NCCL NCCL 概述
- Setup
- Using NCCL 使用 NCCL
- Creating a Communicator
创建 Communicator - Error handling and communicator abort
错误处理和通信器中止 - Fault Tolerance 容错
- Collective Operations
- Data Pointers
- CUDA Stream Semantics
CUDA 流语义 - Group Calls 群组通话
- Point-to-point communication
点对点通信 - Thread Safety 线程安全
- In-place Operations
- Using NCCL with CUDA Graphs
使用 NCCL 与 CUDA Graphs - User Buffer Registration
用户缓冲区注册
- Creating a Communicator
- NCCL API
- Communicator Creation and Management Functions
Communicator 创建和管理函数- ncclGetLastError
- ncclGetErrorString
- ncclGetVersion
- ncclGetUniqueId
- ncclCommInitRank
- ncclCommInitAll
- ncclCommInitRankConfig
- ncclCommInitRankScalable
- ncclCommSplit
- ncclCommFinalize
- ncclCommDestroy
- ncclCommAbort
- ncclCommGetAsyncError
- ncclCommCount
- ncclCommCuDevice
- ncclCommUserRank
- ncclCommRegister
- ncclCommDeregister
- ncclMemAlloc
- ncclMemFree
- Collective Communication Functions
集体通信函数 - Group Calls 群组通话
- Point To Point Communication Functions
点对点通信功能 - Types 类型
- User Defined Reduction Operators
用户自定义归约运算符
- Communicator Creation and Management Functions
- Migrating from NCCL 1 to NCCL 2
从 NCCL 1 迁移到 NCCL 2 - Examples
- NCCL and MPI
NCCL 和 MPI - Environment Variables 环境变量
- System configuration 系统配置
- NCCL_SOCKET_IFNAME
- NCCL_SOCKET_FAMILY
- NCCL_SOCKET_RETRY_CNT
- NCCL_SOCKET_RETRY_SLEEP_MSEC
- NCCL_SOCKET_NTHREADS
- NCCL_NSOCKS_PERTHREAD
- NCCL_CROSS_NIC
- NCCL_IB_HCA
- NCCL_IB_TIMEOUT
- NCCL_IB_RETRY_CNT
- NCCL_IB_GID_INDEX
- NCCL_IB_ADDR_FAMILY
- NCCL_IB_ADDR_RANGE
- NCCL_IB_ROCE_VERSION_NUM
- NCCL_IB_SL
- NCCL_IB_TC
- NCCL_IB_FIFO_TC
- NCCL_IB_RETURN_ASYNC_EVENTS
- NCCL_OOB_NET_ENABLE
- NCCL_OOB_NET_IFNAME
- NCCL_UID_STAGGER_THRESHOLD
- NCCL_UID_STAGGER_RATE
- NCCL_NET
- NCCL_NET_PLUGIN
- NCCL_TUNER_PLUGIN
- NCCL_PROFILER_PLUGIN
- NCCL_IGNORE_CPU_AFFINITY
- NCCL_CONF_FILE
- NCCL_DEBUG
- NCCL_DEBUG_FILE
- NCCL_DEBUG_SUBSYS
- NCCL_COLLNET_ENABLE
- NCCL_COLLNET_NODE_THRESHOLD
- NCCL_TOPO_FILE
- NCCL_TOPO_DUMP_FILE
- NCCL_SET_THREAD_NAME
- Debugging 调试
- NCCL_P2P_DISABLE
- NCCL_P2P_LEVEL
- NCCL_P2P_DIRECT_DISABLE
- NCCL_SHM_DISABLE
- NCCL_BUFFSIZE
- NCCL_NTHREADS
- NCCL_MAX_NCHANNELS
- NCCL_MIN_NCHANNELS
- NCCL_CHECKS_DISABLE
- NCCL_CHECK_POINTERS
- NCCL_LAUNCH_MODE
- NCCL_IB_DISABLE
- NCCL_IB_AR_THRESHOLD
- NCCL_IB_QPS_PER_CONNECTION
- NCCL_IB_SPLIT_DATA_ON_QPS
- NCCL_IB_CUDA_SUPPORT
- NCCL_IB_PCI_RELAXED_ORDERING
- NCCL_IB_ADAPTIVE_ROUTING
- NCCL_IB_ECE_ENABLE
- NCCL_MEM_SYNC_DOMAIN
- NCCL_CUMEM_ENABLE
- NCCL_CUMEM_HOST_ENABLE
- NCCL_NET_GDR_LEVEL (formerly NCCL_IB_GDR_LEVEL)
NCCL_NET_GDR_LEVEL(原为 NCCL_IB_GDR_LEVEL) - NCCL_NET_GDR_READ
- NCCL_NET_SHARED_BUFFERS
- NCCL_NET_SHARED_COMMS
- NCCL_SINGLE_RING_THRESHOLD
- NCCL_LL_THRESHOLD
- NCCL_TREE_THRESHOLD
- NCCL_ALGO
- NCCL_PROTO
- NCCL_NVB_DISABLE
- NCCL_PXN_DISABLE
- NCCL_P2P_PXN_LEVEL
- NCCL_RUNTIME_CONNECT
- NCCL_GRAPH_REGISTER
- NCCL_LOCAL_REGISTER
- NCCL_LEGACY_CUDA_REGISTER
- NCCL_SET_STACK_SIZE
- NCCL_GRAPH_MIXING_SUPPORT
- NCCL_DMABUF_ENABLE
- NCCL_P2P_NET_CHUNKSIZE
- NCCL_P2P_LL_THRESHOLD
- NCCL_ALLOC_P2P_NET_LL_BUFFERS
- NCCL_COMM_BLOCKING
- NCCL_CGA_CLUSTER_SIZE
- NCCL_MAX_CTAS
- NCCL_MIN_CTAS
- NCCL_NVLS_ENABLE
- NCCL_IB_MERGE_NICS
- NCCL_MNNVL_ENABLE
- NCCL_RAS_ENABLE
- NCCL_RAS_ADDR
- NCCL_RAS_TIMEOUT_FACTOR
- System configuration 系统配置
- Troubleshooting 故障排除