ZooKeeper Getting Started Guide
ZooKeeper 入门指南
Getting Started: Coordinating Distributed Applications with ZooKeeper
入门:使用 ZooKeeper 协调分布式应用程序
This document contains information to get you started quickly with ZooKeeper. It is aimed primarily at developers hoping to try it out, and contains simple installation instructions for a single ZooKeeper server, a few commands to verify that it is running, and a simple programming example. Finally, as a convenience, there are a few sections regarding more complicated installations, for example running replicated deployments, and optimizing the transaction log. However for the complete instructions for commercial deployments, please refer to the ZooKeeper Administrator's Guide.
本文档包含可帮助您快速开始使用 ZooKeeper 的信息。它主要面向希望尝试它的开发人员,并包含单个 ZooKeeper 服务器的简单安装说明、一些用于验证其是否正在运行的命令以及一个简单的编程示例。最后,为方便起见,有几个部分介绍了更复杂的安装,例如运行复制的部署和优化事务日志。但是,有关商业部署的完整说明,请参阅 ZooKeeper 管理员指南。
Pre-requisites 先决条件
See System Requirements in the Admin guide.
请参阅管理员指南中的系统要求。
Download 下载
To get a ZooKeeper distribution, download a recent stable release from one of the Apache Download Mirrors.
要获取 ZooKeeper 发行版,请从其中一个 Apache 下载镜像下载最新的稳定版本。
Standalone Operation 独立操作
Setting up a ZooKeeper server in standalone mode is straightforward. The server is contained in a single JAR file, so installation consists of creating a configuration.
在独立模式下设置 ZooKeeper 服务器非常简单。服务器包含在单个 JAR 文件中,因此安装包括创建配置。
Once you've downloaded a stable ZooKeeper release unpack it and cd to the root
下载稳定的 ZooKeeper 版本后,将其解压缩并 cd 到根目录
To start ZooKeeper you need a configuration file. Here is a sample, create it in conf/zoo.cfg:
要启动 ZooKeeper,您需要一个配置文件。这是一个示例,在 conf/zoo.cfg 中创建它:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
This file can be called anything, but for the sake of this discussion call it conf/zoo.cfg. Change the value of dataDir to specify an existing (empty to start with) directory. Here are the meanings for each of the fields:
这个文件可以被称为任何名称,但为了这个讨论,请将其命名为 conf/zoo.cfg。更改 dataDir 的值以指定现有 (开头为空) 目录。以下是每个字段的含义:
-
tickTime : the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
tickTime : ZooKeeper 使用的基本时间单位(以毫秒为单位)。它用于执行心跳,最小会话超时将是 tickTime 的两倍。 -
dataDir : the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
dataDir :存储内存中数据库快照的位置,除非另有指定,否则存储数据库更新的事务日志。 -
clientPort : the port to listen for client connections
clientPort :侦听客户端连接的端口
Now that you created the configuration file, you can start ZooKeeper:
现在您已经创建了配置文件,您可以启动 ZooKeeper:
bin/zkServer.sh start
ZooKeeper logs messages using logback -- more detail available in the Logging section of the Programmer's Guide. You will see log messages coming to the console (default) and/or a log file depending on the logback configuration.
ZooKeeper 使用 logback 记录消息 -- 更多详细信息可在程序员指南的日志记录部分找到。您将看到发送到控制台的日志消息(默认)和/或日志文件,具体取决于 logback 配置。
The steps outlined here run ZooKeeper in standalone mode. There is no replication, so if ZooKeeper process fails, the service will go down. This is fine for most development situations, but to run ZooKeeper in replicated mode, please see Running Replicated ZooKeeper.
此处概述的步骤在独立模式下运行 ZooKeeper。没有复制,因此如果 ZooKeeper 进程失败,服务将关闭。这对于大多数开发情况来说都很好,但要在复制模式下运行 ZooKeeper,请参阅运行复制的 ZooKeeper。
Managing ZooKeeper Storage
管理 ZooKeeper 存储
For long running production systems ZooKeeper storage must be managed externally (dataDir and logs). See the section on maintenance for more details.
对于长时间运行的 生产 系统,必须在外部管理 ZooKeeper 存储(dataDir 和 logs)。有关更多详细信息,请参阅 维护 部分。
Connecting to ZooKeeper 连接到 ZooKeeper
$ bin/zkCli.sh -server 127.0.0.1:2181
This lets you perform simple, file-like operations.
这允许您执行简单的类似文件的操作。
Once you have connected, you should see something like:
连接后,您应该会看到如下内容:
Connecting to localhost:2181
...
Welcome to ZooKeeper!
JLine support is enabled
[zkshell: 0]
From the shell, type help
to get a listing of commands that can be executed from the client, as in:
在 shell 中,键入 help
以获取可从客户端执行的命令列表,如下所示:
[zkshell: 0] help
ZooKeeper -server host:port cmd args
addauth scheme auth
close
config [-c] [-w] [-s]
connect host:port
create [-s] [-e] [-c] [-t ttl] path [data] [acl]
delete [-v version] path
deleteall path
delquota [-n|-b] path
get [-s] [-w] path
getAcl [-s] path
getAllChildrenNumber path
getEphemerals path
history
listquota path
ls [-s] [-w] [-R] path
ls2 path [watch]
printwatches on|off
quit
reconfig [-s] [-v version] [[-file path] | [-members serverID=host:port1:port2;port3[,...]*]] | [-add serverId=host:port1:port2;port3[,...]]* [-remove serverId[,...]*]
redo cmdno
removewatches path [-c|-d|-a] [-l]
rmr path
set [-s] [-v version] path data
setAcl [-s] [-v version] [-R] path acl
setquota -n|-b val path
stat [-w] path
sync path
From here, you can try a few simple commands to get a feel for this simple command line interface. First, start by issuing the list command, as in ls
, yielding:
从这里,您可以尝试一些简单的命令来感受这个简单的命令行界面。首先,首先发出 list 命令,如 ls
,生成:
[zkshell: 8] ls /
[zookeeper]
Next, create a new znode by running create /zk_test my_data
. This creates a new znode and associates the string "my_data" with the node. You should see:
接下来,通过运行 create /zk_test my_data
创建新的 znode。这将创建一个新的 znode 并将字符串 “my_data” 与该节点相关联。您应该会看到:
[zkshell: 9] create /zk_test my_data
Created /zk_test
Issue another ls /
command to see what the directory looks like:
发出另一个 ls /
命令以查看目录的外观:
[zkshell: 11] ls /
[zookeeper, zk_test]
Notice that the zk_test directory has now been created.
请注意,zk_test 目录现已创建。
Next, verify that the data was associated with the znode by running the get
command, as in:
接下来,通过运行 get
命令验证数据是否与 znode 关联,如下所示:
[zkshell: 12] get /zk_test
my_data
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 5
mtime = Fri Jun 05 13:57:06 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0
dataLength = 7
numChildren = 0
We can change the data associated with zk_test by issuing the set
command, as in:
我们可以通过发出 set
命令来更改与 zk_test 关联的数据,如下所示:
[zkshell: 14] set /zk_test junk
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 6
mtime = Fri Jun 05 14:01:52 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0
dataLength = 4
numChildren = 0
[zkshell: 15] get /zk_test
junk
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 6
mtime = Fri Jun 05 14:01:52 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0
dataLength = 4
numChildren = 0
(Notice we did a get
after setting the data and it did, indeed, change.
(请注意,我们在设置数据后执行了 get
,它确实发生了变化。
Finally, let's delete
the node by issuing:
最后,让我们通过发出命令来删除
节点:
[zkshell: 16] delete /zk_test
[zkshell: 17] ls /
[zookeeper]
[zkshell: 18]
That's it for now. To explore more, see the Zookeeper CLI.
现在就这样。要了解更多信息,请参阅 Zookeeper CLI。
Programming to ZooKeeper 编程到 ZooKeeper
ZooKeeper has a Java bindings and C bindings. They are functionally equivalent. The C bindings exist in two variants: single threaded and multi-threaded. These differ only in how the messaging loop is done. For more information, see the Programming Examples in the ZooKeeper Programmer's Guide for sample code using the different APIs.
ZooKeeper 具有 Java 绑定和 C 绑定。它们在功能上是等效的。C 绑定有两种变体:单线程和多线程。这些方法仅在消息传递循环的完成方式上有所不同。有关更多信息,请参阅 ZooKeeper 程序员指南中的编程示例,了解使用不同 API 的示例代码。
Running Replicated ZooKeeper
运行复制的 ZooKeeper
Running ZooKeeper in standalone mode is convenient for evaluation, some development, and testing. But in production, you should run ZooKeeper in replicated mode. A replicated group of servers in the same application is called a quorum, and in replicated mode, all servers in the quorum have copies of the same configuration file.
在独立模式下运行 ZooKeeper 便于评估、一些开发和测试。但在生产环境中,您应该以复制模式运行 ZooKeeper。同一应用程序中的服务器复制组称为仲裁,在复制模式下,仲裁中的所有服务器都具有同一配置文件的副本。
Note 注意
For replicated mode, a minimum of three servers are required, and it is strongly recommended that you have an odd number of servers. If you only have two servers, then you are in a situation where if one of them fails, there are not enough machines to form a majority quorum. Two servers are inherently less stable than a single server, because there are two single points of failure.
对于复制模式,至少需要三台服务器,并且强烈建议您拥有奇数台服务器。如果您只有两台服务器,则如果其中一台服务器发生故障,则没有足够的计算机来形成多数仲裁。两台服务器本质上不如一台服务器稳定,因为存在两个单点故障。
The required conf/zoo.cfg file for replicated mode is similar to the one used in standalone mode, but with a few differences. Here is an example:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
The new entry, initLimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader. The entry syncLimit limits how far out of date a server can be from a leader.
With both of these timeouts, you specify the unit of time using tickTime. In this example, the timeout for initLimit is 5 ticks at 2000 milliseconds a tick, or 10 seconds.
The entries of the form server.X list the servers that make up the ZooKeeper service. When the server starts up, it knows which server it is by looking for the file myid in the data directory. That file has the contains the server number, in ASCII.
Finally, note the two port numbers after each server name: " 2888" and "3888". Peers use the former port to connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses TCP, we currently require another port for leader election. This is the second port in the server entry.
Note
If you want to test multiple servers on a single machine, specify the servername as localhost with unique quorum & leader election ports (i.e. 2888:3888, 2889:3889, 2890:3890 in the example above) for each server.X in that server's config file. Of course separate _dataDir_s and distinct _clientPort_s are also necessary (in the above replicated example, running on a single localhost, you would still have three config files).
Please be aware that setting up multiple servers on a single machine will not create any redundancy. If something were to happen which caused the machine to die, all of the zookeeper servers would be offline. Full redundancy requires that each server have its own machine. It must be a completely separate physical server. Multiple virtual machines on the same physical host are still vulnerable to the complete failure of that host.
If you have multiple network interfaces in your ZooKeeper machines, you can also instruct ZooKeeper to bind on all of your interfaces and automatically switch to a healthy interface in case of a network failure. For details, see the Configuration Parameters.
Other Optimizations
There are a couple of other configuration parameters that can greatly increase performance:
- To get low latencies on updates it is important to have a dedicated transaction log directory. By default transaction logs are put in the same directory as the data snapshots and myid file. The dataLogDir parameters indicates a different directory to use for the transaction logs.