The Hortonworks Blog

Posts categorized by : Hardware

Today we announced the expansion of our strategic relationship with HP enabling HP to resell Hortonworks Data Platform (HDP). As data volumes grow and new data sources emerge it is important for enterprises have access to production ready enterprise Apache Hadoop to meet their big data needs.

With HDP, HP customers can now seamlessly incorporate Hadoop into their modern data architectures to power a variety of new applications and to support existing ones with additional data sources.…

This post’s Principal Author: Ming Ma, Software Development Manager, eBay.  With contribution from Mayank Bansal (eBay), Devaraj Das (Hortonworks), Nicolas Liochon (Scaled Risk), Michael Weng (eBay), Ted Yu (Hortonworks), John Zhao (eBay)

eBay runs Apache Hadoop at extreme scale, with tens of petabytes of data. Hadoop was created for computing challenges like ours, and eBay runs some of the largest Hadoop clusters in existence.

Our business uses Apache HBase to deliver value to our customers in real-time and we are sensitive to any failures because prolonged recovery times significantly degrade site performance and result in material loss of revenue. …

It’s not an easy task to find the right hardware configuration for Hadoop. Thanks to our partner Dell, we’ve detailed a configuration for Hortonworks Data Platform (HDP) on the Dell PowerEdge R720XD. This reference configuration introduces the server set-up that can run the HDP and is intended for organizations looking on configuring Apache Hadoop clusters within their information technology environment for big data analytics.

Download the reference here.

How big is big anyway? What sort of size and shape does a Hadoop cluster take?

These are great questions as you begin to plan a Hadoop implementation. Designing and sizing a cluster is complex and something our technical teams spend a lot of time working with customers on: from storage size to growth rates, from compression rates to cooling then there are many factors to take into account.

To make that a little more fun, we’ve built a cluster-size-o-tron which performs a more simplistic calculation based on some assumptions on node sizes and data payloads to give an indication of how big your particular big is.…

Implementing and integrating Hadoop  to complement existing EDW, RDBMS and Discovery Systems is all part of realizing a Modern Data Architecture for a business which unlocks the opportunities that big data provides for new insight and competitive edge.

That is why we were excited to take part in Cisco and NetApp’s joint announcement of their FlexPod Portfolio because it brings new engineered offerings to the market for enterprises looking to take advantage of Hadoop.…

To deploy, configure, manage and scale Hadoop clusters in a way that optimizes performance and resource utilization there is a lot to consider. Here are  6 key things to think about as part of your planning:

  • Operating system:  Using a 64-bit operating system helps to avoid constraining the amount of memory that can be used on worker nodes. For example, 64-bit Red Hat Enterprise Linux 6.1 or greater is often preferred, due to better ecosystem support, more comprehensive functionality for components such as RAID controllers.
  • When the term scientific computing comes up in a conversation it’s usually just the occasional science geek who shows signs of recognition. But although most people have little or no knowledge of the field’s existence, it has been around since the second half of the twentieth century and has played an increasingly important role in many technological and scientific developments. Internet search engines, DNA analysis, weather forecasting, seismic analysis, renewable energy, and aircraft modeling are just a small number of examples where scientific computing is nowadays indispensible.…

    We get asked a lot of questions about how to select Apache Hadoop worker node hardware. During my time at Yahoo!, we bought a lot of nodes with 6*2TB SATA drives, 24GB RAM and 8 cores in a dual socket configuration. This has proven to be a pretty good configuration. This year, I’ve seen systems with 12*2TB SATA drives, 48GB RAM and 8 cores in a dual socket configurations. We will see a move to 3TB drives this year.…