An open source server that reliably coordinates distributed processes
Apache ZooKeeper provides operational services for a Hadoop cluster. ZooKeeper provides a distributed configuration service, a synchronization service and a naming registry for distributed systems. Distributed applications use Zookeeper to store and mediate updates to important configuration information.
What ZooKeeper Does
ZooKeeper provides a very simple interface and services. ZooKeeper brings these key benefits:
- Fast. ZooKeeper is especially fast with workloads where reads to the data are more common than writes. The ideal read/write ratio is about 10:1.
- Reliable. ZooKeeper is replicated over a set of hosts (called an ensemble) and the servers are aware of each other. As long as a critical mass of servers is available, the ZooKeeper service will also be available. There is no single point of failure.
- Simple. ZooKeeper maintain a standard hierarchical name space, similar to files and directories.
- Ordered. The service maintains a record of all transactions, which can be used for higher-level abstractions, like synchronization primitives.
How ZooKeeper Works
ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers, known as znodes. Every znode is identified by a path, with path elements separated by a slash (“/”). Aside from the root, every znode has a parent, and a znode cannot be deleted if it has children.
This is much like a normal file system, but ZooKeeper provides superior reliability through redundant services. A service is replicated over a set of machines and each maintains an in-memory image of the the data tree and transaction logs. Clients connect to a single ZooKeeper server and maintains a TCP connection through which they send requests and receive responses.
This architecture allows ZooKeeper to provide high throughput and availability with low latency, but the size of the database that ZooKeeper can manage is limited by memory.