ZooKeeper Administrator's Guide
ZooKeeper 管理员指南
A Guide to Deployment and Administration
部署和管理指南
- Deployment 部署
- Administration 行政
- Designing a ZooKeeper Deployment
设计 ZooKeeper 部署 - Provisioning 配置
- Things to Consider: ZooKeeper Strengths and Limitations
需要考虑的事项:ZooKeeper 的优势和局限性 - Administering 管理
- Maintenance 维护
- Supervision 监督
- Monitoring 监控
- Logging 记录
- Troubleshooting 故障排除
- Configuration Parameters 配置参数
- ZooKeeper Commands: The Four Letter Words
ZooKeeper 命令:四个字母的单词 - Data File Management 数据文件管理
- Things to Avoid 要避免的事情
- Best Practices 最佳实践
- Designing a ZooKeeper Deployment
Deployment 部署
This section contains information about deploying Zookeeper and covers these topics:
本节包含有关部署 Zookeeper 的信息并涵盖以下主题:
- System Requirements 系统要求
- Clustered (Multi-Server) Setup
集群(多服务器)设置 - Single Server and Developer Setup
单一服务器和开发人员设置
The first two sections assume you are interested in installing ZooKeeper in a production environment such as a datacenter. The final section covers situations in which you are setting up ZooKeeper on a limited basis - for evaluation, testing, or development - but not in a production environment.
前两节假设您有兴趣在生产环境(例如数据中心)中安装 ZooKeeper。最后一部分介绍了在有限的基础上设置 ZooKeeper 的情况 - 用于评估、测试或开发 - 但不在生产环境中。
System Requirements 系统要求
Supported Platforms 支持的平台
ZooKeeper consists of multiple components. Some components are supported broadly, and other components are supported only on a smaller set of platforms.
ZooKeeper 由多个组件组成。某些组件得到广泛支持,而其他组件仅在较小的平台上受支持。
- Client is the Java client library, used by applications to connect to a ZooKeeper ensemble.
Client是 Java 客户端库,应用程序使用它来连接到 ZooKeeper 整体。 - Server is the Java server that runs on the ZooKeeper ensemble nodes.
Server是在 ZooKeeper 集合节点上运行的 Java 服务器。 - Native Client is a client implemented in C, similar to the Java client, used by applications to connect to a ZooKeeper ensemble.
Native Client是用 C 实现的客户端,类似于 Java 客户端,应用程序使用它来连接到 ZooKeeper 整体。 - Contrib refers to multiple optional add-on components.
Contrib指的是多个可选的附加组件。
The following matrix describes the level of support committed for running each component on different operating system platforms.
以下矩阵描述了在不同操作系统平台上运行每个组件所承诺的支持级别。
Support Matrix 支持矩阵
Operating System 操作系统 | Client 客户 | Server 服务器 | Native Client 本地客户端 | Contrib 贡献 |
---|---|---|---|---|
GNU/Linux | Development and Production 开发生产 |
Development and Production 开发生产 |
Development and Production 开发生产 |
Development and Production 开发生产 |
Solaris 索拉里斯 | Development and Production 开发生产 |
Development and Production 开发生产 |
Not Supported 不支持 | Not Supported 不支持 |
FreeBSD 自由BSD | Development and Production 开发生产 |
Development and Production 开发生产 |
Not Supported 不支持 | Not Supported 不支持 |
Windows 视窗 | Development and Production 开发生产 |
Development and Production 开发生产 |
Not Supported 不支持 | Not Supported 不支持 |
Mac OS X | Development Only 仅限开发 | Development Only 仅限开发 | Not Supported 不支持 | Not Supported 不支持 |
For any operating system not explicitly mentioned as supported in the matrix, components may or may not work. The ZooKeeper community will fix obvious bugs that are reported for other platforms, but there is no full support.
对于矩阵中未明确提及支持的任何操作系统,组件可能会也可能不会工作。 ZooKeeper 社区将修复其他平台报告的明显错误,但没有完全支持。
Required Software 所需软件
ZooKeeper runs in Java, release 1.7 or greater (JDK 7 or greater, FreeBSD support requires openjdk7). It runs as an ensemble of ZooKeeper servers. Three ZooKeeper servers is the minimum recommended size for an ensemble, and we also recommend that they run on separate machines. At Yahoo!, ZooKeeper is usually deployed on dedicated RHEL boxes, with dual-core processors, 2GB of RAM, and 80GB IDE hard drives.
ZooKeeper 在 Java 版本 1.7 或更高版本中运行(JDK 7 或更高版本,FreeBSD 支持需要 openjdk7)。它作为 ZooKeeper 服务器的集合运行。三个 ZooKeeper 服务器是整体建议的最小规模,我们还建议它们在单独的计算机上运行。在 Yahoo!,ZooKeeper 通常部署在专用的 RHEL 机器上,配备双核处理器、2GB RAM 和 80GB IDE 硬盘驱动器。
Clustered (Multi-Server) Setup
集群(多服务器)设置
For reliable ZooKeeper service, you should deploy ZooKeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. For example, with four machines ZooKeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority. However, with five machines ZooKeeper can handle the failure of two machines.
为了获得可靠的 ZooKeeper 服务,您应该将 ZooKeeper 部署在称为ensemble的集群中。只要大多数集合都已启动,该服务就可用。因为Zookeeper需要多数,所以最好使用奇数台机器。例如,四台机器的ZooKeeper只能处理单台机器的故障;如果两台机器出现故障,剩下的两台机器不占多数。然而,如果有五台机器,ZooKeeper 就可以处理两台机器的故障。
Note 笔记
As mentioned in the ZooKeeper Getting Started Guide , a minimum of three servers are required for a fault tolerant clustered setup, and it is strongly recommended that you have an odd number of servers.
正如ZooKeeper 入门指南中提到的,容错集群设置至少需要三台服务器,并且强烈建议您拥有奇数数量的服务器。Usually three servers is more than enough for a production install, but for maximum reliability during maintenance, you may wish to install five servers. With three servers, if you perform maintenance on one of them, you are vulnerable to a failure on one of the other two servers during that maintenance. If you have five of them running, you can take one down for maintenance, and know that you're still OK if one of the other four suddenly fails.
通常,三台服务器对于生产安装来说绰绰有余,但为了在维护期间获得最大可靠性,您可能希望安装五台服务器。对于三台服务器,如果您对其中一台进行维护,则在维护期间,其他两台服务器中的一台很容易出现故障。如果其中五个正在运行,您可以取下其中一个进行维护,并且知道即使其他四个之一突然发生故障,您仍然可以正常工作。Your redundancy considerations should include all aspects of your environment. If you have three ZooKeeper servers, but their network cables are all plugged into the same network switch, then the failure of that switch will take down your entire ensemble.
您的冗余考虑因素应包括环境的所有方面。如果您有三台 ZooKeeper 服务器,但它们的网络电缆都插入同一个网络交换机,那么该交换机的故障将导致整个系统瘫痪。
Here are the steps to setting a server that will be part of an ensemble. These steps should be performed on every host in the ensemble:
以下是设置将成为集合一部分的服务器的步骤。应在整体中的每个主机上执行以下步骤:
Install the Java JDK. You can use the native packaging system for your system, or download the JDK from: http://java.sun.com/javase/downloads/index.jsp
安装 Java JDK。您可以使用适合您系统的本机打包系统,或者从以下位置下载 JDK: http://java.sun.com/javase/downloads/index.jspSet the Java heap size. This is very important to avoid swapping, which will seriously degrade ZooKeeper performance. To determine the correct value, use load tests, and make sure you are well below the usage limit that would cause you to swap. Be conservative - use a maximum heap size of 3GB for a 4GB machine.
设置 Java 堆大小。这对于避免交换非常重要,交换会严重降低 ZooKeeper 的性能。要确定正确的值,请使用负载测试,并确保远低于导致交换的使用限制。保守一点 - 对于 4GB 机器,使用最大堆大小 3GB。Install the ZooKeeper Server Package. It can be downloaded from: http://zookeeper.apache.org/releases.html
安装 ZooKeeper 服务器包。可以从以下地址下载: http://zookeeper.apache.org/releases.htmlCreate a configuration file. This file can be called anything. Use the following settings as a starting point:
创建配置文件。该文件可以命名为任何名称。使用以下设置作为起点:tickTime=2000 dataDir=/var/lib/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
You can find the meanings of these and other configuration settings in the section Configuration Parameters. A word though about a few here: Every machine that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. You accomplish this with the series of lines of the form server.id=host:port:port. The parameters host and port are straightforward. You attribute the server id to each machine by creating a file named myid, one for each server, which resides in that server's data directory, as specified by the configuration file parameter dataDir.
您可以在配置参数部分找到这些和其他配置设置的含义。这里简单说一下:ZooKeeper 整体中的每台机器都应该了解该整体中的所有其他机器。您可以通过server.id=host:port:port形式的一系列行来完成此操作。参数主机和端口很简单。您可以通过创建一个名为myid的文件(每个服务器一个)来将服务器 ID 归属于每台计算机,该文件驻留在该服务器的数据目录中,如配置文件参数dataDir所指定。The myid file consists of a single line containing only the text of that machine's id. So myid of server 1 would contain the text "1" and nothing else. The id must be unique within the ensemble and should have a value between 1 and 255. IMPORTANT: if you enable extended features such as TTL Nodes (see below) the id must be between 1 and 254 due to internal limitations.
myid 文件由一行组成,其中仅包含该计算机的 id 文本。因此服务器 1 的myid将包含文本“1”,仅此而已。 id 在整体中必须是唯一的,并且值应介于 1 和 255 之间。重要提示:如果启用 TTL 节点(见下文)等扩展功能,由于内部限制,id 必须介于 1 和 254 之间。If your configuration file is set up, you can start a ZooKeeper server:
如果您的配置文件已设置,您可以启动 ZooKeeper 服务器:$ java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.17.jar:conf \\ org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg
QuorumPeerMain starts a ZooKeeper server, JMX management beans are also registered which allows management through a JMX management console. The ZooKeeper JMX document contains details on managing ZooKeeper with JMX. See the script bin/zkServer.sh, which is included in the release, for an example of starting server instances.
QuorumPeerMain启动ZooKeeper服务器,还注册JMX管理bean,允许通过JMX管理控制台进行管理。 ZooKeeper JMX 文档包含有关使用 JMX 管理 ZooKeeper 的详细信息。有关启动服务器实例的示例,请参阅该版本中包含的脚本bin/zkServer.sh 。
Test your deployment by connecting to the hosts: In Java, you can run the following command to execute simple operations:
通过连接到主机来测试您的部署:在 Java 中,您可以运行以下命令来执行简单的操作:$ bin/zkCli.sh -server 127.0.0.1:2181
Single Server and Developer Setup
单一服务器和开发人员设置
If you want to setup ZooKeeper for development purposes, you will probably want to setup a single server instance of ZooKeeper, and then install either the Java or C client-side libraries and bindings on your development machine.
如果您想出于开发目的设置 ZooKeeper,您可能需要设置 ZooKeeper 的单个服务器实例,然后在您的开发计算机上安装 Java 或 C 客户端库和绑定。
The steps to setting up a single server instance are the similar to the above, except the configuration file is simpler. You can find the complete instructions in the Installing and Running ZooKeeper in Single Server Mode section of the ZooKeeper Getting Started Guide.
设置单个服务器实例的步骤与上面类似,只是配置文件更简单。您可以在ZooKeeper 入门指南的以单服务器模式安装和运行 ZooKeeper部分找到完整的说明。
For information on installing the client side libraries, refer to the Bindings section of the ZooKeeper Programmer's Guide.
有关安装客户端库的信息,请参阅ZooKeeper 程序员指南的绑定部分。
Administration 行政
This section contains information about running and maintaining ZooKeeper and covers these topics:
本节包含有关运行和维护 ZooKeeper 的信息,并涵盖以下主题:
- Designing a ZooKeeper Deployment
设计 ZooKeeper 部署 - Provisioning 配置
- Things to Consider: ZooKeeper Strengths and Limitations
需要考虑的事项:ZooKeeper 的优势和局限性 - Administering 管理
- Maintenance 维护
- Supervision 监督
- Monitoring 监控
- Logging 记录
- Troubleshooting 故障排除
- Configuration Parameters 配置参数
- ZooKeeper Commands: The Four Letter Words
ZooKeeper 命令:四个字母的单词 - Data File Management 数据文件管理
- Things to Avoid 要避免的事情
- Best Practices 最佳实践
Designing a ZooKeeper Deployment
设计 ZooKeeper 部署
The reliability of ZooKeeper rests on two basic assumptions.
ZooKeeper 的可靠性取决于两个基本假设。
- Only a minority of servers in a deployment will fail. Failure in this context means a machine crash, or some error in the network that partitions a server off from the majority.
部署中只有少数服务器会发生故障。在这种情况下,故障意味着机器崩溃,或者网络中出现将服务器与大多数服务器隔离的错误。 - Deployed machines operate correctly. To operate correctly means to execute code correctly, to have clocks that work properly, and to have storage and network components that perform consistently.
部署的机器运行正常。正确操作意味着正确执行代码、时钟正常工作以及存储和网络组件运行一致。
The sections below contain considerations for ZooKeeper administrators to maximize the probability for these assumptions to hold true. Some of these are cross-machines considerations, and others are things you should consider for each and every machine in your deployment.
以下部分包含 ZooKeeper 管理员的注意事项,以最大限度地提高这些假设成立的可能性。其中一些是跨机器的注意事项,另一些是您应该为部署中的每台机器考虑的事项。
Cross Machine Requirements
跨机器要求
For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason, ZooKeeper deployments are usually made up of an odd number of machines.
为了使 ZooKeeper 服务处于活动状态,必须有大多数无故障的计算机可以相互通信。要创建可以容忍 F 台机器故障的部署,您应该依靠部署 2xF+1 台机器。因此,由三台机器组成的部署可以处理一种故障,而由五台机器组成的部署可以处理两种故障。请注意,六台机器的部署只能处理两个故障,因为三台机器并不是大多数。因此,ZooKeeper 部署通常由奇数台机器组成。
To achieve the highest probability of tolerating a failure you should try to make machine failures independent. For example, if most of the machines share the same switch, failure of that switch could cause a correlated failure and bring down the service. The same holds true of shared power circuits, cooling systems, etc.
为了实现容忍故障的最高概率,您应该尝试使机器故障独立。例如,如果大多数计算机共享同一交换机,则该交换机的故障可能会导致相关故障并导致服务中断。共享电源电路、冷却系统等也是如此。
Single Machine Requirements
单机要求
If ZooKeeper has to contend with other applications for access to resources like storage media, CPU, network, or memory, its performance will suffer markedly. ZooKeeper has strong durability guarantees, which means it uses storage media to log changes before the operation responsible for the change is allowed to complete. You should be aware of this dependency then, and take great care if you want to ensure that ZooKeeper operations aren’t held up by your media. Here are some things you can do to minimize that sort of degradation:
如果 ZooKeeper 必须与其他应用程序竞争存储介质、CPU、网络或内存等资源的访问,其性能将受到显着影响。 ZooKeeper 具有强大的持久性保证,这意味着它在负责更改的操作被允许完成之前使用存储介质来记录更改。那么您应该意识到这种依赖性,并且如果您想确保 ZooKeeper 操作不会被您的媒体阻止,请务必小心。您可以采取以下措施来最大限度地减少这种退化:
- ZooKeeper's transaction log must be on a dedicated device. (A dedicated partition is not enough.) ZooKeeper writes the log sequentially, without seeking Sharing your log device with other processes can cause seeks and contention, which in turn can cause multi-second delays.
ZooKeeper的事务日志必须位于专用设备上。 (专用分区是不够的。) ZooKeeper 按顺序写入日志,而不进行查找 与其他进程共享日志设备可能会导致查找和争用,进而导致数秒的延迟。 - Do not put ZooKeeper in a situation that can cause a swap. In order for ZooKeeper to function with any sort of timeliness, it simply cannot be allowed to swap. Therefore, make certain that the maximum heap size given to ZooKeeper is not bigger than the amount of real memory available to ZooKeeper. For more on this, see Things to Avoid below.
不要将 ZooKeeper 置于可能导致交换的情况下。为了让 ZooKeeper 能够及时运行,它根本就不能被允许交换。因此,请确保为 ZooKeeper 提供的最大堆大小不大于 ZooKeeper 可用的实际内存量。有关这方面的更多信息,请参阅下面要避免的事情。
Provisioning 配置
Things to Consider: ZooKeeper Strengths and Limitations
需要考虑的事项:ZooKeeper 的优势和局限性
Administering 管理
Maintenance 维护
Little long term maintenance is required for a ZooKeeper cluster however you must be aware of the following:
ZooKeeper 集群几乎不需要长期维护,但您必须注意以下事项:
Ongoing Data Directory Cleanup
正在进行的数据目录清理
The ZooKeeper Data Directory contains files which are a persistent copy of the znodes stored by a particular serving ensemble. These are the snapshot and transactional log files. As changes are made to the znodes these changes are appended to a transaction log. Occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem and a new transaction log file is created for future transactions. During snapshotting, ZooKeeper may continue appending incoming transactions to the old log file. Therefore, some transactions which are newer than a snapshot may be found in the last transaction log preceding the snapshot.
ZooKeeper数据目录包含的文件是由特定服务整体存储的 znode 的持久副本。这些是快照和事务日志文件。当对 znode 进行更改时,这些更改将附加到事务日志中。有时,当日志变大时,所有 znode 当前状态的快照将写入文件系统,并为将来的事务创建新的事务日志文件。在快照期间,ZooKeeper 可能会继续将传入事务附加到旧日志文件中。因此,可能会在快照之前的最后一个事务日志中找到一些比快照更新的事务。
A ZooKeeper server will not remove old snapshots and log files when using the default configuration (see autopurge below), this is the responsibility of the operator. Every serving environment is different and therefore the requirements of managing these files may differ from install to install (backup for example).
使用默认配置时,ZooKeeper 服务器不会删除旧的快照和日志文件(请参阅下面的自动清除),这是操作员的责任。每个服务环境都不同,因此管理这些文件的要求可能因安装而异(例如备份)。
The PurgeTxnLog utility implements a simple retention policy that administrators can use. The API docs contains details on calling conventions (arguments, etc...).
PurgeTxnLog 实用程序实现了管理员可以使用的简单保留策略。 API 文档包含有关调用约定(参数等)的详细信息。
In the following example the last count snapshots and their corresponding logs are retained and the others are deleted. The value of
在以下示例中,保留最后计数的快照及其相应的日志,并删除其他快照。的值通常应大于 3(尽管不是必需的,但这会在最近的日志损坏的情况下提供 3 个备份)。这可以作为 ZooKeeper 服务器计算机上的 cron 作业运行,以每天清理日志。
java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.17.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count>
Automatic purging of the snapshots and corresponding transaction logs was introduced in version 3.4.0 and can be enabled via the following configuration parameters autopurge.snapRetainCount and autopurge.purgeInterval. For more on this, see Advanced Configuration below.
3.4.0 版本中引入了自动清除快照和相应事务日志的功能,可以通过以下配置参数autopurge.snapRetainCount和autopurge.purgeInterval启用。有关详细信息,请参阅下面的高级配置。
Debug Log Cleanup (log4j)
调试日志清理 (log4j)
See the section on logging in this document. It is expected that you will setup a rolling file appender using the in-built log4j feature. The sample configuration file in the release tar's conf/log4j.properties provides an example of this.
请参阅本文档中有关登录的部分。预计您将使用内置的 log4j 功能设置滚动文件附加程序。发行版 tar 的 conf/log4j.properties 中的示例配置文件提供了这样的示例。
Supervision 监督
You will want to have a supervisory process that manages each of your ZooKeeper server processes (JVM). The ZK server is designed to be "fail fast" meaning that it will shutdown (process exit) if an error occurs that it cannot recover from. As a ZooKeeper serving cluster is highly reliable, this means that while the server may go down the cluster as a whole is still active and serving requests. Additionally, as the cluster is "self healing" the failed server once restarted will automatically rejoin the ensemble w/o any manual interaction.
您将需要一个监督进程来管理每个 ZooKeeper 服务器进程 (JVM)。 ZK 服务器被设计为“快速失败”,这意味着如果发生无法恢复的错误,它将关闭(进程退出)。由于 ZooKeeper 服务集群高度可靠,这意味着虽然服务器可能出现故障,但整个集群仍然处于活动状态并为请求提供服务。此外,由于集群是“自我修复”的,故障服务器一旦重新启动将自动重新加入集群,无需任何手动交互。
Having a supervisory process such as daemontools or SMF (other options for supervisory process are also available, it's up to you which one you would like to use, these are just two examples) managing your ZooKeeper server ensures that if the process does exit abnormally it will automatically be restarted and will quickly rejoin the cluster.
拥有一个监控进程,例如daemontools或SMF (监控进程的其他选项也可用,这取决于您想要使用哪一个,这只是两个示例)管理您的 ZooKeeper 服务器可确保如果进程确实异常退出将自动重启并快速重新加入集群。
Monitoring 监控
The ZooKeeper service can be monitored in one of two primary ways; 1) the command port through the use of 4 letter words and 2) JMX. See the appropriate section for your environment/requirements.
ZooKeeper 服务可以通过两种主要方式之一进行监控: 1) 通过使用4 个字母单词的命令端口和 2) JMX 。请参阅适合您的环境/要求的部分。
Logging 记录
ZooKeeper uses log4j version 1.2 as its logging infrastructure. The ZooKeeper default log4j.properties
file resides in the conf
directory. Log4j requires that log4j.properties
either be in the working directory (the directory from which ZooKeeper is run) or be accessible from the classpath.
ZooKeeper 使用log4j版本 1.2 作为其日志记录基础设施。 ZooKeeper 默认的log4j.properties
文件位于conf
目录中。 Log4j 要求log4j.properties
位于工作目录(ZooKeeper 运行的目录)中或者可以从类路径访问。
For more information, see Log4j Default Initialization Procedure of the log4j manual.
有关详细信息,请参阅 log4j 手册的Log4j 默认初始化过程。
Troubleshooting 故障排除
- Server not coming up because of file corruption : A server might not be able to read its database and fail to come up because of some file corruption in the transaction logs of the ZooKeeper server. You will see some IOException on loading ZooKeeper database. In such a case, make sure all the other servers in your ensemble are up and working. Use "stat" command on the command port to see if they are in good health. After you have verified that all the other servers of the ensemble are up, you can go ahead and clean the database of the corrupt server. Delete all the files in datadir/version-2 and datalogdir/version-2/. Restart the server.
由于文件损坏,服务器无法启动:由于 ZooKeeper 服务器的事务日志中的某些文件损坏,服务器可能无法读取其数据库并且无法启动。您将在加载 ZooKeeper 数据库时看到一些 IOException。在这种情况下,请确保您的整体中的所有其他服务器均已启动并正常工作。在命令端口上使用“stat”命令来查看它们是否处于良好状态。在验证了 ensemble 的所有其他服务器都已启动后,您可以继续清理损坏服务器的数据库。删除datadir/version-2和datalogdir/version-2/中的所有文件。重新启动服务器。
Configuration Parameters 配置参数
ZooKeeper's behavior is governed by the ZooKeeper configuration file. This file is designed so that the exact same file can be used by all the servers that make up a ZooKeeper server assuming the disk layouts are the same. If servers use different configuration files, care must be taken to ensure that the list of servers in all of the different configuration files match.
ZooKeeper 的行为由 ZooKeeper 配置文件控制。该文件的设计使得组成 ZooKeeper 服务器的所有服务器都可以使用完全相同的文件(假设磁盘布局相同)。如果服务器使用不同的配置文件,则必须注意确保所有不同配置文件中的服务器列表匹配。
Minimum Configuration 最低配置
Here are the minimum configuration keywords that must be defined in the configuration file:
以下是配置文件中必须定义的最少配置关键字:
clientPort : the port to listen for client connections; that is, the port that clients attempt to connect to.
clientPort :监听客户端连接的端口;即客户端尝试连接的端口。dataDir : the location where ZooKeeper will store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
dataDir :ZooKeeper 将存储内存数据库快照以及数据库更新的事务日志(除非另有指定)的位置。Note 笔记
Be careful where you put the transaction log. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely effect performance.
请小心放置事务日志的位置。专用事务日志设备是保持良好性能的关键。将日志放在繁忙的设备上会对性能产生不利影响。tickTime : the length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session timeout will be two ticks.
tickTime :单个tick的长度,这是ZooKeeper使用的基本时间单位,以毫秒为单位。它用于调节心跳和超时。例如,最小会话超时将为两个滴答声。
Advanced Configuration 高级配置
The configuration settings in the section are optional. You can use them to further fine tune the behaviour of your ZooKeeper servers. Some can also be set using Java system properties, generally of the form zookeeper.keyword. The exact system property, when available, is noted below.
该部分中的配置设置是可选的。您可以使用它们进一步微调 ZooKeeper 服务器的行为。有些还可以使用 Java 系统属性进行设置,通常采用Zookeeper.keyword形式。确切的系统属性(如果可用)如下所述。
dataLogDir : (No Java system property) This option will direct the machine to write the transaction log to the dataLogDir rather than the dataDir. This allows a dedicated log device to be used, and helps avoid competition between logging and snapshots.
dataLogDir :(无 Java 系统属性)此选项将指示机器将事务日志写入dataLogDir而不是dataDir 。这允许使用专用的日志设备,并有助于避免日志记录和快照之间的竞争。Note 笔记
Having a dedicated log device has a large impact on throughput and stable latencies. It is highly recommended to dedicate a log device and set dataLogDir to point to a directory on that device, and then make sure to point dataDir to a directory not residing on that device.
拥有专用的日志设备对吞吐量和稳定延迟有很大影响。强烈建议专用一个日志设备并将dataLogDir设置为指向该设备上的目录,然后确保将dataDir指向不驻留在该设备上的目录。globalOutstandingLimit : (Java system property: zookeeper.globalOutstandingLimit.) Clients can submit requests faster than ZooKeeper can process them, especially if there are a lot of clients. To prevent ZooKeeper from running out of memory due to queued requests, ZooKeeper will throttle clients so that there is no more than globalOutstandingLimit outstanding requests in the system. The default limit is 1,000.
globalOutstandLimit :(Java 系统属性: zookeeper.globalOutstandingLimit。 )客户端提交请求的速度比 ZooKeeper 处理请求的速度更快,特别是在有很多客户端的情况下。为了防止 ZooKeeper 由于排队请求而耗尽内存,ZooKeeper 将限制客户端,以便系统中的未完成请求不超过 globalOutstandingLimit。默认限制为 1,000。preAllocSize : (Java system property: zookeeper.preAllocSize) To avoid seeks ZooKeeper allocates space in the transaction log file in blocks of preAllocSize kilobytes. The default block size is 64M. One reason for changing the size of the blocks is to reduce the block size if snapshots are taken more often. (Also, see snapCount).
preAllocSize :(Java 系统属性: zookeeper.preAllocSize )为了避免查找,ZooKeeper 在事务日志文件中以 preAllocSize 千字节块为单位分配空间。默认块大小为 64M。更改块大小的原因之一是,如果更频繁地拍摄快照,则可以减小块大小。 (另请参阅snapCount )。snapCount : (Java system property: zookeeper.snapCount) ZooKeeper records its transactions using snapshots and a transaction log (think write-ahead log).The number of transactions recorded in the transaction log before a snapshot can be taken (and the transaction log rolled) is determined by snapCount. In order to prevent all of the machines in the quorum from taking a snapshot at the same time, each ZooKeeper server will take a snapshot when the number of transactions in the transaction log reaches a runtime generated random value in the [snapCount/2+1, snapCount] range.The default snapCount is 100,000.
snapCount :(Java 系统属性: zookeeper.snapCount )ZooKeeper 使用快照和事务日志(想想预写日志)记录其事务。在拍摄快照之前事务日志中记录的事务数(以及滚动的事务日志) ) 由 snapCount 确定。为了防止仲裁中的所有机器同时拍摄快照,当事务日志中的事务数量达到 [snapCount/2+1 中运行时生成的随机值时,每个 ZooKeeper 服务器都会拍摄快照。 , snapCount]范围。默认snapCount为100,000。maxClientCnxns : (No Java system property) Limits the number of concurrent connections (at the socket level) that a single client, identified by IP address, may make to a single member of the ZooKeeper ensemble. This is used to prevent certain classes of DoS attacks, including file descriptor exhaustion. The default is 60. Setting this to 0 entirely removes the limit on concurrent connections.
maxClientCnxns :(无 Java 系统属性)限制单个客户端(由 IP 地址标识)可以与 ZooKeeper 整体的单个成员建立的并发连接数(在套接字级别)。这用于防止某些类别的 DoS 攻击,包括文件描述符耗尽。默认值为 60。将其设置为 0 可以完全消除并发连接的限制。clientPortAddress : New in 3.3.0: the address (ipv4, ipv6 or hostname) to listen for client connections; that is, the address that clients attempt to connect to. This is optional, by default we bind in such a way that any connection to the clientPort for any address/interface/nic on the server will be accepted.
clientPortAddress : 3.3.0 中的新增功能:用于侦听客户端连接的地址(ipv4、ipv6 或主机名);即客户端尝试连接的地址。这是可选的,默认情况下,我们以这样的方式绑定,即服务器上任何地址/接口/NIC 到clientPort的任何连接都将被接受。minSessionTimeout : (No Java system property) New in 3.3.0: the minimum session timeout in milliseconds that the server will allow the client to negotiate. Defaults to 2 times the tickTime.
minSessionTimeout :(无 Java 系统属性) 3.3.0 中的新增功能:服务器允许客户端协商的最小会话超时(以毫秒为单位)。默认为tickTime 的2 倍。maxSessionTimeout : (No Java system property) New in 3.3.0: the maximum session timeout in milliseconds that the server will allow the client to negotiate. Defaults to 20 times the tickTime.
maxSessionTimeout :(无 Java 系统属性) 3.3.0 中的新增功能:服务器允许客户端协商的最大会话超时(以毫秒为单位)。默认为tickTime 的20 倍。fsync.warningthresholdms : (Java system property: zookeeper.fsync.warningthresholdms) New in 3.3.4: A warning message will be output to the log whenever an fsync in the Transactional Log (WAL) takes longer than this value. The values is specified in milliseconds and defaults to 1000. This value can only be set as a system property.
fsync.warningthresholdms :(Java 系统属性: zookeeper.fsync.warningthresholdms ) 3.3.4 中的新增功能:只要事务日志 (WAL) 中的 fsync 花费的时间超过此值,就会向日志输出一条警告消息。这些值以毫秒为单位指定,默认为 1000。该值只能设置为系统属性。autopurge.snapRetainCount : (No Java system property) New in 3.4.0: When enabled, ZooKeeper auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. Defaults to 3. Minimum value is 3.
autopurge.snapRetainCount :(无 Java 系统属性) 3.4.0 中的新增功能:启用后,ZooKeeper 自动清除功能将分别保留dataDir和dataLogDir中的autopurge.snapRetainCount最新快照和相应事务日志,并删除其余部分。默认为 3。最小值为 3。autopurge.purgeInterval : (No Java system property) New in 3.4.0: The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0.
autopurge.purgeInterval :(无 Java 系统属性) 3.4.0 中的新增功能:必须触发清除任务的时间间隔(以小时为单位)。设置为正整数(1 及以上)以启用自动清除。默认为 0。syncEnabled : (Java system property: zookeeper.observer.syncEnabled) New in 3.4.6: The observers now log transaction and write snapshot to disk by default like the participants. This reduces the recovery time of the observers on restart. Set to "false" to disable this feature. Default is "true"
syncEnabled :(Java 系统属性: zookeeper.observer.syncEnabled ) 3.4.6 中的新增功能:观察者现在像参与者一样默认记录事务并将快照写入磁盘。这减少了观察者重启时的恢复时间。设置为“false”可禁用此功能。默认为“真”
Cluster Options 集群选项
The options in this section are designed for use with an ensemble of servers -- that is, when deploying clusters of servers.
本节中的选项设计用于服务器集合,即部署服务器集群时。
electionAlg : (No Java system property) Election implementation to use. A value of "1" corresponds to the non-authenticated UDP-based version of fast leader election, "2" corresponds to the authenticated UDP-based version of fast leader election, and "3" corresponds to TCP-based version of fast leader election. Currently, algorithm 3 is the default.
electionAlg :(无 Java 系统属性)要使用的选举实现。值“1”对应于未经身份验证的基于 UDP 的快速领导者选举版本,“2”对应于经过身份验证的基于 UDP 的快速领导者选举版本,“3”对应于基于 TCP 的快速领导者选举版本选举。目前,算法 3 是默认值。Note 笔记
The implementations of leader election 1, and 2 are now deprecated. We have the intention of removing them in the next release, at which point only the FastLeaderElection will be available.
领导人选举 1 和 2 的实现现已弃用。我们打算在下一个版本中删除它们,届时只有 FastLeaderElection 可用。initLimit : (No Java system property) Amount of time, in ticks (see tickTime), to allow followers to connect and sync to a leader. Increased this value as needed, if the amount of data managed by ZooKeeper is large.
initLimit :(无 Java 系统属性)允许追随者连接并同步到领导者的时间量(以刻度为单位)(请参阅tickTime )。如果 ZooKeeper 管理的数据量较大,请根据需要增大此值。leaderServes : (Java system property: zookeeper.**leaderServes**) Leader accepts client connections. Default value is "yes". The leader machine coordinates updates. For higher update throughput at the slight expense of read throughput the leader can be configured to not accept clients and focus on coordination. The default to this option is yes, which means that a leader will accept client connections.
LeaderServes :(Java 系统属性:zookeeper.**leaderServes**)Leader 接受客户端连接。默认值为“是”。领导机坐标更新。为了以少量读取吞吐量为代价获得更高的更新吞吐量,领导者可以配置为不接受客户端并专注于协调。该选项的默认值为 yes,这意味着领导者将接受客户端连接。Note 笔记
Turning on leader selection is highly recommended when you have more than three ZooKeeper servers in an ensemble.
当集合中有超过三个 ZooKeeper 服务器时,强烈建议打开领导者选择。server.x=[hostname]:nnnnn[:nnnnn], etc : (No Java system property) servers making up the ZooKeeper ensemble. When the server starts up, it determines which server it is by looking for the file myid in the data directory. That file contains the server number, in ASCII, and it should match x in server.x in the left hand side of this setting. The list of servers that make up ZooKeeper servers that is used by the clients must match the list of ZooKeeper servers that each ZooKeeper server has. There are two port numbers nnnnn. The first followers use to connect to the leader, and the second is for leader election. If you want to test multiple servers on a single machine, then different ports can be used for each server.
server.x=[主机名]:nnnnn[:nnnnn] 等:(无 Java 系统属性)构成 ZooKeeper 整体的服务器。当服务器启动时,它通过在数据目录中查找文件myid来确定它是哪个服务器。该文件包含服务器编号(以 ASCII 表示),并且它应该与此设置左侧的server.x中的x匹配。客户端使用的组成 ZooKeeper 服务器的服务器列表必须与每个 ZooKeeper 服务器拥有的 ZooKeeper 服务器列表相匹配。有两个端口号nnnnn 。第一个追随者用于连接领导者,第二个用于领导者选举。如果要在一台机器上测试多个服务器,则可以为每个服务器使用不同的端口。syncLimit : (No Java system property) Amount of time, in ticks (see tickTime), to allow followers to sync with ZooKeeper. If followers fall too far behind a leader, they will be dropped.
syncLimit :(无 Java 系统属性)允许关注者与 ZooKeeper 同步的时间量(以刻度为单位)(请参阅tickTime )。如果追随者落后领导者太远,他们就会被抛弃。group.x=nnnnn[:nnnnn] : (No Java system property) Enables a hierarchical quorum construction."x" is a group identifier and the numbers following the "=" sign correspond to server identifiers. The left-hand side of the assignment is a colon-separated list of server identifiers. Note that groups must be disjoint and the union of all groups must be the ZooKeeper ensemble. You will find an example here
group.x=nnnnn[:nnnnn] :(无 Java 系统属性)启用分层仲裁构造。“x”是组标识符,“=”符号后面的数字对应于服务器标识符。分配的左侧是以冒号分隔的服务器标识符列表。请注意,组必须是不相交的,并且所有组的并集必须是 ZooKeeper 整体。你会在这里找到一个例子weight.x=nnnnn : (No Java system property) Used along with "group", it assigns a weight to a server when forming quorums. Such a value corresponds to the weight of a server when voting. There are a few parts of ZooKeeper that require voting such as leader election and the atomic broadcast protocol. By default the weight of server is 1. If the configuration defines groups, but not weights, then a value of 1 will be assigned to all servers. You will find an example here
Weight.x=nnnnn :(无 Java 系统属性)与“group”一起使用,在形成仲裁时为服务器分配权重。这个值对应的是服务器投票时的权重。 ZooKeeper 有一些部分需要投票,例如领导者选举和原子广播协议。默认情况下,服务器的权重为 1。如果配置定义了组,但未定义权重,则将为所有服务器分配值 1。你会在这里找到一个例子cnxTimeout : (Java system property: zookeeper.**cnxTimeout**) Sets the timeout value for opening connections for leader election notifications. Only applicable if you are using electionAlg 3.
cnxTimeout :(Java 系统属性:zookeeper.**cnxTimeout**)设置为领导者选举通知打开连接的超时值。仅当您使用electionAlg 3 时才适用。Note 笔记
Default value is 5 seconds.
默认值为 5 秒。4lw.commands.whitelist : (Java system property: zookeeper.4lw.commands.whitelist) New in 3.5.3: A list of comma separated Four Letter Words commands that user wants to use. A valid Four Letter Words command must be put in this list else ZooKeeper server will not enable the command. By default the whitelist only contains "srvr" command which zkServer.sh uses. The rest of four letter word commands are disabled by default. Here's an example of the configuration that enables stat, ruok, conf, and isro command while disabling the rest of Four Letter Words command:
4lw.commands.whitelist :(Java 系统属性: zookeeper.4lw.commands.whitelist ) 3.5.3 中的新增功能:用户想要使用的逗号分隔的四个字母单词命令的列表。必须将有效的四字母单词命令放入此列表中,否则 ZooKeeper 服务器将不会启用该命令。默认情况下,白名单仅包含 zkServer.sh 使用的“srvr”命令。其余四字母单词命令默认禁用。以下是启用 stat、ruok、conf 和 isro 命令同时禁用其余四字母词命令的配置示例:4lw.commands.whitelist=stat, ruok, conf, isro
If you really need enable all four letter word commands by default, you can use the asterisk option so you don't have to include every command one by one in the list. As an example, this will enable all four letter word commands:
如果您确实需要默认启用所有四个字母的单词命令,则可以使用星号选项,这样您就不必将每个命令一一包含在列表中。例如,这将启用所有四个字母的单词命令:
4lw.commands.whitelist=*
ipReachableTimeout : (Java system property: zookeeper.ipReachableTimeout)
ipReachableTimeout : (Java 系统属性: zookeeper.ipReachableTimeout )New in 3.4.11: Set this timeout value for IP addresses reachable checking when hostname is resolved, as mesured in milliseconds. By default, ZooKeeper will use the first IP address of the hostname(without any reachable checking). When zookeeper.ipReachableTimeout is set(larger than 0), ZooKeeper will will try to pick up the first IP address which is reachable. This is done by calling Java API InetAddress.isReachable(long timeout) function, in which this timeout value is used. If none of such reachable IP address can be found, the first IP address of the hostname will be used anyway.
3.4.11 中的新增功能:设置主机名解析时 IP 地址可达检查的超时值,以毫秒为单位。默认情况下,ZooKeeper 将使用主机名的第一个 IP 地址(不进行任何可达性检查)。当zookeeper.ipReachableTimeout设置(大于0)时,ZooKeeper将尝试拾取第一个可达的IP地址。这是通过调用 Java API InetAddress.isReachable(long timeout) 函数来完成的,其中使用了此超时值。如果找不到任何可到达的 IP 地址,则无论如何都会使用主机名的第一个 IP 地址。tcpKeepAlive : (Java system property: zookeeper.tcpKeepAlive) New in 3.4.11: Setting this to true sets the TCP keepAlive flag on the sockets used by quorum members to perform elections. This will allow for connections between quorum members to remain up when there is network infrastructure that may otherwise break them. Some NATs and firewalls may terminate or lose state for long running or idle connections. Enabling this option relies on OS level settings to work properly, check your operating system's options regarding TCP keepalive for more information. Defaults to false.
tcpKeepAlive :(Java 系统属性: zookeeper.tcpKeepAlive ) 3.4.11 中的新增功能:将其设置为 true 会在仲裁成员用于执行选举的套接字上设置 TCP keepAlive 标志。这将允许仲裁成员之间的连接在网络基础设施可能会中断时保持连接。某些 NAT 和防火墙可能会终止或丢失长时间运行或空闲连接的状态。启用此选项依赖于操作系统级别设置才能正常工作,请检查操作系统有关 TCP keepalive 的选项以获取更多信息。默认为false 。
Authentication and Authorization Options
身份验证和授权选项
The options in this section allow control over encryption/authentication/authorization performed by the service.
本节中的选项允许控制服务执行的加密/身份验证/授权。
DigestAuthenticationProvider.superDigest : (Java system property: zookeeper.DigestAuthenticationProvider.superDigest) By default this feature is disabled New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a "super" user. In particular no ACL checking occurs for a user authenticated as super. org.apache.zookeeper.server.auth.DigestAuthenticationProvider can be used to generate the superDigest, call it with one parameter of "super:
". Provide the generated "super:" as the system property value when starting each server of the ensemble. When authenticating to a ZooKeeper server (from a ZooKeeper client) pass a scheme of "digest" and authdata of "super: ". Note that digest auth passes the authdata in plaintext to the server, it would be prudent to use this authentication method only on localhost (not over the network) or over an encrypted connection.
DigestAuthenticationProvider.superDigest :(Java 系统属性: zookeeper.DigestAuthenticationProvider.superDigest )默认情况下禁用此功能3.2 中的新增功能:使 ZooKeeper 整体管理员能够以“超级”用户身份访问 znode 层次结构。特别是,对于经过身份验证的超级用户,不会进行 ACL 检查。 org.apache.zookeeper.server.auth.DigestAuthenticationProvider 可用于生成 superDigest,使用一个参数“super:”调用它。提供生成的“超级: ”作为启动集合的每个服务器时的系统属性值。当向 ZooKeeper 服务器(从 ZooKeeper 客户端)进行身份验证时,传递“digest”方案和“super:”的身份验证数据。请注意,摘要身份验证以明文形式传递身份验证数据对于服务器来说,谨慎的做法是仅在本地主机(而不是通过网络)或通过加密连接使用此身份验证方法。isro : New in 3.4.0: Tests if server is running in read-only mode. The server will respond with "ro" if in read-only mode or "rw" if not in read-only mode.
isro : 3.4.0 中的新增功能:测试服务器是否以只读模式运行。如果处于只读模式,服务器将响应“ro”;如果不处于只读模式,服务器将响应“rw”。gtmk : Gets the current trace mask as a 64-bit signed long value in decimal format. See
stmk
for an explanation of the possible values.
gtmk :获取当前跟踪掩码作为十进制格式的 64 位带符号长值。有关可能值的说明,请参阅stmk
。stmk : Sets the current trace mask. The trace mask is 64 bits, where each bit enables or disables a specific category of trace logging on the server. Log4J must be configured to enable
TRACE
level first in order to see trace logging messages. The bits of the trace mask correspond to the following trace logging categories.
stmk :设置当前跟踪掩码。跟踪掩码为 64 位,其中每一位启用或禁用服务器上特定类别的跟踪日志记录。 Log4J 必须首先配置为启用TRACE
级别才能查看跟踪日志记录消息。跟踪掩码的位对应于以下跟踪日志记录类别。Trace Mask Bit Values 跟踪掩码位值 0b0000000000 Unused, reserved for future use.
未使用,保留以供将来使用。0b0000000010 Logs client requests, excluding ping requests.
记录客户端请求,不包括 ping 请求。0b0000000100 Unused, reserved for future use.
未使用,保留以供将来使用。0b0000001000 Logs client ping requests.
记录客户端 ping 请求。0b0000010000 Logs packets received from the quorum peer that is the current leader, excluding ping requests.
记录从当前领导者仲裁对等方收到的数据包,不包括 ping 请求。0b0000100000 Logs addition, removal and validation of client sessions.
记录客户端会话的添加、删除和验证。0b0001000000 Logs delivery of watch events to client sessions.
记录监视事件向客户端会话的传递。0b0010000000 Logs ping packets received from the quorum peer that is the current leader.
记录从当前领导者仲裁对等方收到的 ping 数据包。0b0100000000 Unused, reserved for future use.
未使用,保留以供将来使用。0b1000000000 Unused, reserved for future use.
未使用,保留以供将来使用。All remaining bits in the 64-bit value are unused and reserved for future use. Multiple trace logging categories are specified by calculating the bitwise OR of the documented values. The default trace mask is 0b0100110010. Thus, by default, trace logging includes client requests, packets received from the leader and sessions. To set a different trace mask, send a request containing the
stmk
four-letter word followed by the trace mask represented as a 64-bit signed long value. This example uses the Perlpack
function to construct a trace mask that enables all trace logging categories described above and convert it to a 64-bit signed long value with big-endian byte order. The result is appended tostmk
and sent to the server using netcat. The server responds with the new trace mask in decimal format.
64 位值中的所有剩余位均未使用并保留以供将来使用。通过计算记录值的按位或来指定多个跟踪日志记录类别。默认跟踪掩码为 0b0100110010。因此,默认情况下,跟踪日志记录包括客户端请求、从领导者接收的数据包和会话。要设置不同的跟踪掩码,请发送包含stmk
四字母字的请求,后跟表示为 64 位有符号长整型值的跟踪掩码。此示例使用 Perlpack
函数构建一个跟踪掩码,该掩码启用上述所有跟踪日志记录类别,并将其转换为具有大端字节顺序的 64 位有符号长值。结果附加到stmk
并使用 netcat 发送到服务器。服务器以十进制格式响应新的跟踪掩码。
$ perl -e "print 'stmk', pack('q>', 0b0011111010)" | nc localhost 2181
250
Experimental Options/Features
实验选项/功能
New features that are currently considered experimental.
目前被认为是实验性的新功能。
- Read Only Mode Server : (Java system property: readonlymode.enabled) New in 3.4.0: Setting this value to true enables Read Only Mode server support (disabled by default). ROM allows clients sessions which requested ROM support to connect to the server even when the server might be partitioned from the quorum. In this mode ROM clients can still read values from the ZK service, but will be unable to write values and see changes from other clients. See ZOOKEEPER-784 for more details.
只读模式服务器:(Java 系统属性: readonlymode.enabled ) 3.4.0 中的新增功能:将此值设置为 true 启用只读模式服务器支持(默认情况下禁用)。即使服务器可能与仲裁隔离,ROM 也允许请求 ROM 支持的客户端会话连接到服务器。在此模式下,ROM 客户端仍然可以从 ZK 服务读取值,但无法写入值并查看其他客户端的更改。有关更多详细信息,请参阅 ZOOKEEPER-784。
Unsafe Options 不安全选项
The following options can be useful, but be careful when you use them. The risk of each is explained along with the explanation of what the variable does.
以下选项可能很有用,但使用时要小心。每个风险的解释以及变量作用的解释。
forceSync : (Java system property: zookeeper.forceSync) Requires updates to be synced to media of the transaction log before finishing processing the update. If this option is set to no, ZooKeeper will not require updates to be synced to the media.
forceSync :(Java 系统属性: zookeeper.forceSync )要求在完成更新处理之前将更新同步到事务日志的介质。如果此选项设置为 no,ZooKeeper 将不需要将更新同步到媒体。jute.maxbuffer: : (Java system property:**jute.maxbuffer**) This option can only be set as a Java system property. There is no zookeeper prefix on it. It specifies the maximum size of the data that can be stored in a znode. The default is 0xfffff, or just under 1M. If this option is changed, the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. ZooKeeper is designed to store data on the order of kilobytes in size.
jute.maxbuffer: : (Java 系统属性:**jute.maxbuffer**) 此选项只能设置为 Java 系统属性。上面没有zookeeper前缀。它指定 znode 中可以存储的数据的最大大小。默认值为 0xfffff,即不到 1M。如果更改此选项,则必须在所有服务器和客户端上设置系统属性,否则会出现问题。这确实是一次健全性检查。 ZooKeeper 旨在存储千字节大小的数据。skipACL : (Java system property: zookeeper.skipACL) Skips ACL checks. This results in a boost in throughput, but opens up full access to the data tree to everyone.
skipACL :(Java 系统属性: zookeeper.skipACL )跳过 ACL 检查。这会提高吞吐量,但也向每个人开放了对数据树的完全访问权限。quorumListenOnAllIPs : When set to true the ZooKeeper server will listen for connections from its peers on all available IP addresses, and not only the address configured in the server list of the configuration file. It affects the connections handling the ZAB protocol and the Fast Leader Election protocol. Default value is false.
quorumListenOnAllIPs :设置为 true 时,ZooKeeper 服务器将侦听所有可用 IP 地址上的对等方的连接,而不仅仅是配置文件的服务器列表中配置的地址。它影响处理 ZAB 协议和快速领导者选举协议的连接。默认值为false 。
Communication using the Netty framework
使用Netty框架进行通信
Netty is an NIO based client/server communication framework, it simplifies (over NIO being used directly) many of the complexities of network level communication for java applications. Additionally the Netty framework has built in support for encryption (SSL) and authentication (certificates). These are optional features and can be turned on or off individually.
Netty是一个基于 NIO 的客户端/服务器通信框架,它简化了(相对于直接使用 NIO 而言)java 应用程序网络级通信的许多复杂性。此外,Netty 框架内置了对加密 (SSL) 和身份验证(证书)的支持。这些是可选功能,可以单独打开或关闭。
In versions 3.5+, a ZooKeeper server can use Netty instead of NIO (default option) by setting the environment variable zookeeper.serverCnxnFactory to org.apache.zookeeper.server.NettyServerCnxnFactory; for the client, set zookeeper.clientCnxnSocket to org.apache.zookeeper.ClientCnxnSocketNetty.
在版本3.5+中,ZooKeeper服务器可以通过将环境变量zookeeper.serverCnxnFactory设置为org.apache.zookeeper.server.NettyServerCnxnFactory来使用Netty而不是NIO(默认选项);对于客户端,将Zookeeper.clientCnxnSocket设置为org.apache.zookeeper.ClientCnxnSocketNetty 。
TBD - tuning options for netty - currently there are none that are netty specific but we should add some. Esp around max bound on the number of reader worker threads netty creates.
TBD - netty 的调整选项 - 目前没有任何特定于 netty 的选项,但我们应该添加一些。特别是 netty 创建的读取器工作线程数量的最大限制。
TBD - how to manage encryption
TBD - 如何管理加密
TBD - how to manage certificates
TBD - 如何管理证书
ZooKeeper Commands: The Four Letter Words
ZooKeeper 命令:四个字母的单词
ZooKeeper responds to a small set of commands. Each command is composed of four letters. You issue the commands to ZooKeeper via telnet or nc, at the client port.
ZooKeeper 响应一小组命令。每个命令由四个字母组成。您可以在客户端端口通过 telnet 或 nc 向 ZooKeeper 发出命令。
Three of the more interesting commands: "stat" gives some general information about the server and connected clients, while "srvr" and "cons" give extended details on server and connections respectively.
三个更有趣的命令:“stat”提供有关服务器和连接的客户端的一些常规信息,而“srvr”和“cons”分别提供有关服务器和连接的扩展详细信息。
conf : New in 3.3.0: Print details about serving configuration.
conf : 3.3.0 中的新增功能:打印有关服务配置的详细信息。cons : New in 3.3.0: List full connection/session details for all clients connected to this server. Includes information on numbers of packets received/sent, session id, operation latencies, last operation performed, etc...
缺点: 3.3.0 中的新增功能:列出连接到此服务器的所有客户端的完整连接/会话详细信息。包括有关接收/发送的数据包数量、会话 ID、操作延迟、上次执行的操作等信息...crst : New in 3.3.0: Reset connection/session statistics for all connections.
crst : 3.3.0 中的新增功能:重置所有连接的连接/会话统计信息。dump : Lists the outstanding sessions and ephemeral nodes. This only works on the leader.
dump :列出未完成的会话和临时节点。这只对领导者有效。envi : Print details about serving environment
envi :打印有关服务环境的详细信息ruok : Tests if server is running in a non-error state. The server will respond with imok if it is running. Otherwise it will not respond at all. A response of "imok" does not necessarily indicate that the server has joined the quorum, just that the server process is active and bound to the specified client port. Use "stat" for details on state wrt quorum and client connection information.
ruok :测试服务器是否运行在无错误状态。如果服务器正在运行,它将以 imok 响应。否则根本不会响应。 “imok”的响应并不一定表明服务器已加入仲裁,只是表明服务器进程处于活动状态并绑定到指定的客户端端口。使用“stat”获取有关仲裁状态和客户端连接信息的详细信息。srst : Reset server statistics.
srst :重置服务器统计信息。srvr : New in 3.3.0: Lists full details for the server.
srvr : 3.3.0 中的新增功能:列出服务器的完整详细信息。stat : Lists brief details for the server and connected clients.
stat :列出服务器和连接的客户端的简要详细信息。wchs : New in 3.3.0: Lists brief information on watches for the server.
wchs : 3.3.0 中的新增功能:列出有关服务器监视的简要信息。wchc : New in 3.3.0: Lists detailed information on watches for the server, by session. This outputs a list of sessions(connections) with associated watches (paths). Note, depending on the number of watches this operation may be expensive (ie impact server performance), use it carefully.
wchc : 3.3.0 中的新增功能:按会话列出服务器监视的详细信息。这会输出具有关联监视(路径)的会话(连接)列表。请注意,根据手表的数量,此操作可能会很昂贵(即影响服务器性能),请谨慎使用。dirs : New in 3.5.1: Shows the total size of snapshot and log files in bytes
dirs : 3.5.1 中的新增功能:显示快照和日志文件的总大小(以字节为单位)wchp : New in 3.3.0: Lists detailed information on watches for the server, by path. This outputs a list of paths (znodes) with associated sessions. Note, depending on the number of watches this operation may be expensive (ie impact server performance), use it carefully.
wchp : 3.3.0 中的新增功能:按路径列出服务器监视的详细信息。这将输出具有关联会话的路径(znode)列表。请注意,根据手表的数量,此操作可能会很昂贵(即影响服务器性能),请谨慎使用。mntr : New in 3.4.0: Outputs a list of variables that could be used for monitoring the health of the cluster.
mntr : 3.4.0 中的新增功能:输出可用于监控集群运行状况的变量列表。
$ echo mntr | nc localhost 2185
zk_version 3.4.0
zk_avg_latency 0
zk_max_latency 0
zk_min_latency 0
zk_packets_received 70
zk_packets_sent 69
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 4
zk_watch_count 0
zk_ephemerals_count 0
zk_approximate_data_size 27
zk_followers 4 - only exposed by the Leader
zk_synced_followers 4 - only exposed by the Leader
zk_pending_syncs 0 - only exposed by the Leader
zk_open_file_descriptor_count 23 - only available on Unix platforms
zk_max_file_descriptor_count 1024 - only available on Unix platforms
zk_fsync_threshold_exceed_count 0
The output is compatible with java properties format and the content may change over time (new keys added). Your scripts should expect changes. ATTENTION: Some of the keys are platform specific and some of the keys are only exported by the Leader. The output contains multiple lines with the following format:
输出与java属性格式兼容,内容可能会随着时间的推移而改变(添加了新的键)。您的脚本应该会发生变化。注意:有些密钥是特定于平台的,有些密钥仅由 Leader 导出。输出包含多行,格式如下:
key \t value
Here's an example of the ruok command:
以下是ruok命令的示例:
$ echo ruok | nc 127.0.0.1 5111
imok
Data File Management 数据文件管理
ZooKeeper stores its data in a data directory and its transaction log in a transaction log directory. By default these two directories are the same. The server can (and should) be configured to store the transaction log files in a separate directory than the data files. Throughput increases and latency decreases when transaction logs reside on a dedicated log devices.
ZooKeeper 将其数据存储在数据目录中,将其事务日志存储在事务日志目录中。默认情况下这两个目录是相同的。服务器可以(并且应该)配置为将事务日志文件存储在与数据文件不同的目录中。当事务日志驻留在专用日志设备上时,吞吐量会增加,延迟会减少。
The Data Directory 数据目录
This directory has two or three files in it:
该目录中有两个或三个文件:
- myid - contains a single integer in human readable ASCII text that represents the server id.
myid - 包含一个人类可读 ASCII 文本形式的整数,表示服务器 ID。 - snapshot.
- holds the fuzzy snapshot of a data tree.
快照。 - 保存数据树的模糊快照。
Each ZooKeeper server has a unique id. This id is used in two places: the myid file and the configuration file. The myid file identifies the server that corresponds to the given data directory. The configuration file lists the contact information for each server identified by its server id. When a ZooKeeper server instance starts, it reads its id from the myid file and then, using that id, reads from the configuration file, looking up the port on which it should listen.
每个 ZooKeeper 服务器都有一个唯一的 id。这个id用在两个地方: myid文件和配置文件。 myid文件标识与给定数据目录对应的服务器。配置文件列出了由服务器 ID 标识的每个服务器的联系信息。当 ZooKeeper 服务器实例启动时,它从myid文件中读取其 id,然后使用该 id 从配置文件中读取,查找它应该侦听的端口。
The snapshot files stored in the data directory are fuzzy snapshots in the sense that during the time the ZooKeeper server is taking the snapshot, updates are occurring to the data tree. The suffix of the snapshot file names is the zxid, the ZooKeeper transaction id, of the last committed transaction at the start of the snapshot. Thus, the snapshot includes a subset of the updates to the data tree that occurred while the snapshot was in process. The snapshot, then, may not correspond to any data tree that actually existed, and for this reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can recover using this snapshot because it takes advantage of the idempotent nature of its updates. By replaying the transaction log against fuzzy snapshots ZooKeeper gets the state of the system at the end of the log.
存储在数据目录中的快照文件是模糊快照,因为在 ZooKeeper 服务器拍摄快照期间,数据树正在发生更新。快照文件名的后缀是zxid ,即快照开始时最后提交的事务的 ZooKeeper 事务 ID。因此,快照包括在快照处理期间发生的数据树更新的子集。那么,快照可能不对应于任何实际存在的数据树,因此我们将其称为模糊快照。尽管如此,ZooKeeper 仍可以使用此快照进行恢复,因为它利用了其更新的幂等性质。通过根据模糊快照重放事务日志,ZooKeeper 在日志末尾获取系统的状态。
The Log Directory 日志目录
The Log Directory contains the ZooKeeper transaction logs. Before any update takes place, ZooKeeper ensures that the transaction that represents the update is written to non-volatile storage. A new log file is started when the number of transactions written to the current log file reaches a (variable) threshold. The threshold is computed using the same parameter which influences the frequency of snapshotting (see snapCount above). The log file's suffix is the first zxid written to that log.
日志目录包含 ZooKeeper 事务日志。在进行任何更新之前,ZooKeeper 确保将代表更新的事务写入非易失性存储。当写入当前日志文件的事务数量达到(可变)阈值时,将启动新的日志文件。使用影响快照频率的相同参数计算阈值(请参见上面的 snapCount)。日志文件的后缀是写入该日志的第一个 zxid。
File Management 文件管理
The format of snapshot and log files does not change between standalone ZooKeeper servers and different configurations of replicated ZooKeeper servers. Therefore, you can pull these files from a running replicated ZooKeeper server to a development machine with a stand-alone ZooKeeper server for trouble shooting.
快照和日志文件的格式在独立 ZooKeeper 服务器和不同配置的复制 ZooKeeper 服务器之间不会改变。因此,您可以将这些文件从正在运行的复制 ZooKeeper 服务器拉取到具有独立 ZooKeeper 服务器的开发计算机,以进行故障排除。
Using older log and snapshot files, you can look at the previous state of ZooKeeper servers and even restore that state. The LogFormatter class allows an administrator to look at the transactions in a log.
使用较旧的日志和快照文件,您可以查看 ZooKeeper 服务器以前的状态,甚至可以恢复该状态。 LogFormatter 类允许管理员查看日志中的事务。
The ZooKeeper server creates snapshot and log files, but never deletes them. The retention policy of the data and log files is implemented outside of the ZooKeeper server. The server itself only needs the latest complete fuzzy snapshot, all log files following it, and the last log file preceding it. The latter requirement is necessary to include updates which happened after this snapshot was started but went into the existing log file at that time. This is possible because snapshotting and rolling over of logs proceed somewhat independently in ZooKeeper. See the maintenance section in this document for more details on setting a retention policy and maintenance of ZooKeeper storage.
ZooKeeper 服务器创建快照和日志文件,但从不删除它们。数据和日志文件的保留策略是在 ZooKeeper 服务器外部实现的。服务器本身只需要最新的完整模糊快照、其后的所有日志文件以及其前的最后一个日志文件。后一个要求必须包含在此快照启动后发生但当时进入现有日志文件的更新。这是可能的,因为日志的快照和滚动在 ZooKeeper 中某种程度上是独立进行的。有关设置保留策略和维护 ZooKeeper 存储的更多详细信息,请参阅本文档中的维护部分。
Note 笔记
The data stored in these files is not encrypted. In the case of storing sensitive data in ZooKeeper, necessary measures need to be taken to prevent unauthorized access. Such measures are external to ZooKeeper (e.g., control access to the files) and depend on the individual settings in which it is being deployed.
这些文件中存储的数据未加密。当ZooKeeper中存储敏感数据时,需要采取必要的措施来防止未经授权的访问。这些措施是 ZooKeeper 的外部措施(例如,控制对文件的访问),并且取决于部署它的各个设置。
Recovery - TxnLogToolkit 恢复 - TxnLogToolkit
TxnLogToolkit is a command line tool shipped with ZooKeeper which is capable of recovering transaction log entries with broken CRC.
TxnLogToolkit 是 ZooKeeper 附带的命令行工具,能够恢复带有损坏 CRC 的事务日志条目。
Running it without any command line parameters or with the -h,--help
argument, it outputs the following help page:
在没有任何命令行参数或使用-h,--help
参数的情况下运行它,它会输出以下帮助页面:
$ bin/zkTxnLogToolkit.sh
usage: TxnLogToolkit [-dhrv] txn_log_file_name
-d,--dump Dump mode. Dump all entries of the log file. (this is the default)
-h,--help Print help message
-r,--recover Recovery mode. Re-calculate CRC for broken entries.
-v,--verbose Be verbose in recovery mode: print all entries, not just fixed ones.
-y,--yes Non-interactive mode: repair all CRC errors without asking
The default behaviour is safe: it dumps the entries of the given transaction log file to the screen: (same as using -d,--dump
parameter)
默认行为是安全的:它将给定事务日志文件的条目转储到屏幕:(与使用-d,--dump
参数相同)
$ bin/zkTxnLogToolkit.sh log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
4/5/18 2:15:58 PM CEST session 0x16295bafcc40000 cxid 0x0 zxid 0x100000001 createSession 30000
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
4/5/18 2:16:12 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x100000003 createSession 30000
4/5/18 2:17:34 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x200000001 closeSession null
4/5/18 2:17:34 PM CEST session 0x16295bd23720000 cxid 0x0 zxid 0x200000002 createSession 30000
4/5/18 2:18:02 PM CEST session 0x16295bd23720000 cxid 0x2 zxid 0x200000003 create '/andor,#626262,v{s{31,s{'world,'anyone}}},F,1
EOF reached after 6 txns.
There's a CRC error in the 2nd entry of the above transaction log file. In dump mode, the toolkit only prints this information to the screen without touching the original file. In recovery mode (-r,--recover
flag) the original file still remains untouched and all transactions will be copied over to a new txn log file with ".fixed" suffix. It recalculates CRC values and copies the calculated value, if it doesn't match the original txn entry. By default, the tool works interactively: it asks for confirmation whenever CRC error encountered.
上述事务日志文件的第二个条目中存在 CRC 错误。在转储模式下,工具包仅将此信息打印到屏幕上,而不触及原始文件。在恢复模式( -r,--recover
标志)下,原始文件仍然保持不变,所有事务将被复制到带有“.fixed”后缀的新 txn 日志文件中。如果它与原始 txn 条目不匹配,它会重新计算 CRC 值并复制计算值。默认情况下,该工具以交互方式工作:每当遇到 CRC 错误时,它都会要求确认。
$ bin/zkTxnLogToolkit.sh -r log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ?
Answering Yes means the newly calculated CRC value will be outputted to the new file. No means that the original CRC value will be copied over. Abort will abort the entire operation and exits. (In this case the ".fixed" will not be deleted and left in a half-complete state: contains only entries which have already been processed or only the header if the operation was aborted at the first entry.)
回答“是”表示新计算的 CRC 值将输出到新文件中。否意味着原始 CRC 值将被复制。 Abort将中止整个操作并退出。 (在这种情况下,“.fixed”将不会被删除并处于半完成状态:仅包含已处理的条目,或者如果操作在第一个条目处中止,则仅包含标头。)
$ bin/zkTxnLogToolkit.sh -r log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ? y
EOF reached after 6 txns.
Recovery file log.100000001.fixed has been written with 1 fixed CRC error(s)
The default behaviour of recovery is to be silent: only entries with CRC error get printed to the screen. One can turn on verbose mode with the -v,--verbose
parameter to see all records. Interactive mode can be turned off with the -y,--yes
parameter. In this case all CRC errors will be fixed in the new transaction file.
恢复的默认行为是静默:只有带有 CRC 错误的条目才会打印到屏幕上。可以使用-v,--verbose
参数打开详细模式来查看所有记录。可以使用-y,--yes
参数关闭交互模式。在这种情况下,所有 CRC 错误都将在新的事务文件中修复。
Things to Avoid 要避免的事情
Here are some common problems you can avoid by configuring ZooKeeper correctly:
以下是通过正确配置 ZooKeeper 可以避免的一些常见问题:
inconsistent lists of servers : The list of ZooKeeper servers used by the clients must match the list of ZooKeeper servers that each ZooKeeper server has. Things work okay if the client list is a subset of the real list, but things will really act strange if clients have a list of ZooKeeper servers that are in different ZooKeeper clusters. Also, the server lists in each Zookeeper server configuration file should be consistent with one another.
服务器列表不一致:客户端使用的 ZooKeeper 服务器列表必须与每个 ZooKeeper 服务器拥有的 ZooKeeper 服务器列表匹配。如果客户端列表是真实列表的子集,那么事情就可以正常工作,但是如果客户端拥有位于不同 ZooKeeper 集群中的 ZooKeeper 服务器列表,事情就会变得很奇怪。另外,每个Zookeeper服务器配置文件中的服务器列表应该相互一致。incorrect placement of transaction log : The most performance critical part of ZooKeeper is the transaction log. ZooKeeper syncs transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely effect performance. If you only have one storage device, put trace files on NFS and increase the snapshotCount; it doesn't eliminate the problem, but it should mitigate it.
事务日志放置不正确:ZooKeeper 性能最关键的部分是事务日志。 ZooKeeper 在返回响应之前将事务同步到媒体。专用事务日志设备是保持良好性能的关键。将日志放在繁忙的设备上会对性能产生不利影响。如果只有一台存储设备,请将跟踪文件放在 NFS 上并增加 snapshotCount;它并不能消除问题,但应该可以减轻问题。incorrect Java heap size : You should take special care to set your Java max heap size correctly. In particular, you should not create a situation in which ZooKeeper swaps to disk. The disk is death to ZooKeeper. Everything is ordered, so if processing one request swaps the disk, all other queued requests will probably do the same. the disk. DON'T SWAP. Be conservative in your estimates: if you have 4G of RAM, do not set the Java max heap size to 6G or even 4G. For example, it is more likely you would use a 3G heap for a 4G machine, as the operating system and the cache also need memory. The best and only recommend practice for estimating the heap size your system needs is to run load tests, and then make sure you are well below the usage limit that would cause the system to swap.
不正确的 Java 堆大小:您应该特别注意正确设置 Java 最大堆大小。特别是,您不应该创建 ZooKeeper 交换到磁盘的情况。磁盘对于 ZooKeeper 来说就是死亡。一切都是有序的,因此如果处理一个请求交换磁盘,所有其他排队的请求可能会执行相同的操作。磁盘。不要交换。请保守估计:如果您有 4G RAM,请勿将 Java 最大堆大小设置为 6G 甚至 4G。例如,您更有可能为 4G 机器使用 3G 堆,因为操作系统和缓存也需要内存。估计系统所需的堆大小的最佳且唯一推荐的做法是运行负载测试,然后确保远低于会导致系统交换的使用限制。Publicly accessible deployment : A ZooKeeper ensemble is expected to operate in a trusted computing environment. It is thus recommended to deploy ZooKeeper behind a firewall.
可公开访问的部署:ZooKeeper 整体预计将在可信计算环境中运行。因此建议在防火墙后面部署 ZooKeeper。
Best Practices 最佳实践
For best results, take note of the following list of good Zookeeper practices:
为了获得最佳结果,请注意以下 Zookeeper 良好实践列表:
For multi-tenant installations see the section detailing ZooKeeper "chroot" support, this can be very useful when deploying many applications/services interfacing to a single ZooKeeper cluster.
对于多租户安装,请参阅详细介绍 ZooKeeper“chroot”支持的部分,这在部署连接到单个 ZooKeeper 集群的许多应用程序/服务时非常有用。