Apache ZooKeeper

An open source server that reliably coordinates distributed processes

Apache ZooKeeper provides operational services for a Hadoop cluster. ZooKeeper provides a distributed configuration service, a synchronization service and a naming registry for distributed systems. Distributed applications use Zookeeper to store and mediate updates to important configuration information.

What ZooKeeper Does

ZooKeeper provides a very simple interface and services. ZooKeeper brings these key benefits:

  • Fast. ZooKeeper is especially fast with workloads where reads to the data are more common than writes. The ideal read/write ratio is about 10:1.
  • Reliable. ZooKeeper is replicated over a set of hosts (called an ensemble) and the servers are aware of each other. As long as a critical mass of servers is available, the ZooKeeper service will also be available. There is no single point of failure.
  • Simple. ZooKeeper maintain a standard hierarchical name space, similar to files and directories.
  • Ordered. The service maintains a record of all transactions, which can be used for higher-level abstractions, like synchronization primitives.

How ZooKeeper Works

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers, known as znodes. Every znode is identified by a path, with path elements separated by a slash (“/”). Aside from the root, every znode has a parent, and a znode cannot be deleted if it has children.

This is much like a normal file system, but ZooKeeper provides superior reliability through redundant services. A service is replicated over a set of machines and each maintains an in-memory image of the the data tree and transaction logs. Clients connect to a single ZooKeeper server and maintains a TCP connection through which they send requests and receive responses.

This architecture allows ZooKeeper to provide high throughput and availability with low latency, but the size of the database that ZooKeeper can mange is limited by memory.

Try these Tutorials

Apache Top-Level Project Since
January 2011
Hortonworks Committers
2

Try it with Sandbox

Hortonworks Sandbox is a self-contained virtual machine with HDP running alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Join the Webinar!

YARN Ready – Integrating to YARN with Tez (part 3)
Thursday, August 21, 2014
12:00 PM Eastern / 9:00 AM Pacific

More Webinars »

Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.