Hadoop Cluster Size-O-Tron

Thinking about your data challenges, and planning your Hadoop cluster just got a little more fun. Select your expected data payload and this guide will provide an indication of your hardware and infrastructure requirements. Go deeper by downloading ourĀ Hadoop Cluster Sizing and Configuration Guide.

This model makes some assumptions about Node storage size (Approx. 30TB usable) and size of data payloads (between 1 and 4KB). Actual cluster sizing and configuration is a little more complex than this tool, so we’ve limited to a minimum of 8 nodes and a maximum of 400 nodes. Most likely, your initial case falls between those and we encourage you to download the guide and talk to us about your specific needs.

How are you planning on using Hadoop? Previous Next tick tick tick tick

Hadoop can deliver new business value as you can take advantage of new types of data, or data that was previously uneconomic to work with.

Learn more about delivering new business value in the Hortonworks Cluster Configuration Guide.
Hadoop is designed to interoperate alongside existing datacenter infrastructure combining with existing ETL operations providing processing at a larger scale than ever before.

Learn more about interoperating alongside existing datacenter infrastructure in the Hortonworks Cluster Configuration Guide.
Deliver New Business Value Deliver New Business Value Help Integrate Hadoop into the Data Center Deliver Data Center Efficiency Help
selected unselected selected unselected

What Industry are you designing for? Next Previous tick tick tick tick tick

selected unselected selected unselected selected unselected selected unselected selected unselected selected unselected selected unselected
Retail/Web Telco Government Finance Energy Manufacturing Healthcare
in

What type of data are you handling? Next Previous tick tick tick tick

# of daily page views
# of interactions daily

Enter
selected unselected
# of fans/followers
# of interactions daily

Enter
selected unselected
# of machines
# sensors per machine

Enter
selected unselected
# connected devices
# polls per hour

Enter
selected unselected
entry size (KB)
entries per week

Enter
selected unselected
document size (KB)
documents per week

Enter
selected unselected
Clickstream Clickstream Sentiment Sentiment Sensor/Machine Sensor/Machine Geo Tracking Geo Tracking System Logs System Logs Text Text
Clickstream Sentiment Sensor/Machine Geo Tracking Server Logs Text
A virtual trail that a user leaves behind while visiting a website. A clickstream is a record of a user's activity on the Internet. For users not logged in to the site, this data may be captured using cookies.

Examples: pages visited, length of visit per page, flows between pages, interaction with web forms, bounce rates
Help
Data on opinions, emotions, and attitudes contained in social media, blogs, news, product reviews, and enterprise feedback streams.

Examples: Twitter, Facebook, LinkedIn, website comments
Help
Data that is automatically created from a computer process, application, or other machine without the intervention of a human.

Examples: smart electric meters, network event logs, manufacturing QA
Help
Location data from connected devices whose position is determined using GPS or by triangulation from cell towers.

Examples: oil and gas exploration, first responders, defense
Help
Computer generated information that captures data on the operations of a computer network

Examples: servers, routers, network traces, security logs
Help
Unstructured or semi-structured text in forms and documents

Examples: leases, contracts, applications, patents, proposals
Help
with data

How often do you intend to process your data? Next Previous tick tick tick tick

selected unselected selected unselected selected unselected selected unselected
Continuously Frequently Historically Regulatorily
Data access is likely to be continuous and perhaps support mission-critical operations.
Help
Data access is likely to be repeated processes run as batches to support existing operations.
Help
Data access is likely to be more of an analytical exercise on an as-needed basis.
Help
Data access is likely to be a standard, scheduled, infrequent task.
Help
with , processed

How quickly is your data growing? Next Previous tick tick tick tick

Regularly Regularly Help Exponentially Exponentially Help
selected unselected selected unselected
with data, processed at growth rate.
tick tick tick tick
Facebook Twitter LinkedIn
Print
Print Redo Download Whitepaper
Hardware estimate Infrastructure estimate
Master Nodes Master Nodes: Floor Space Floor Space: ft2
Your cluster will likely benefit from network isolation and physical isolation from other assets in the data center.

Learn more about important considerations related to isolation in the Hortonworks Cluster Configuation Guide.
Physical Isolation Help
Recommended   Optional
Slave Nodes Slave Nodes: Cooling Cooling: BTU/Mo
Racks Racks: Power Power: KwH/Mo
Since all Hadoop jobs demand max performance, shared resources will become saturated and result in a performance bottleneck.

Learn more about shared nothing architecture in the Hortonworks Cluster Configuation Guide.
Network Isolation Help
Recommended   Optional
Rows Rows: Network Bandwidth Network Bandwidth: Mb/s
with data, processed at growth rate.
Step 1
Step 2
Step 3
Step 4
Step 5
Done