It’s our pleasure to host Ryan Peterson, Chief Solution Strategist at EMC, as a guest blogger to expand upon another great step in our partnership to deliver compelling customer solutions through joint engineering efforts. Follow Ryan @BigDataRyan.
Object storage isn’t a new concept and EMC’s been innovating around it since the beginning. Take our Centera and Atmos products as key examples. The first Centera was created around the idea that objects could store much higher quantities of data than a file system in a single store while the other aspect of Centera was a rich set of security and compliancy features file systems had not been able to achieve. Data shredding for example was a feature required by governments and law firms. We all know some politicians who need a Centera system 😉 Atmos on the other hand was designed with a completely different base requirement. The goal was to support a geo-parity environment mostly seen in large enterprise customers and with service providers. In the Atmos design, data written to one location would be protected by other locations and yet share a common namespace. The design inspired many large internet-scale companies you likely use today and some of them are even backed by an Atmos system.
But when you are innovating from scratch, you make design decisions that leave things out and you learn from the 25,000 current EMC object storage customers. So we started with a new baseline of code and added in many of the components of Centera and Atmos to create something new, exciting, and dare I say revolutionary. Enter Elastic Cloud Storage (ECS) which can scale from one rack in one data center to many racks in many data centers thus encompassing the design requirements of both Atmos and Centera and with new data protection features that increase performance from the original design such as local replica process, erasure coding for high performance, and geo-protection using XOR to reduce overhead. ECS changes the game!
But with the advent of new technologies in the world such as is near and dear to my heart, Hadoop, the design needed to include the capability to analyze the data in the entire global namespace and do it efficiently.
ECS includes a mapping to Hadoop using the Hadoop Compatible File System (HCFS) guidelines the same way you might see a Lustre or Gluster connect. The metadata controllers in ECS provide the namespace context and allow Hadoop to be able to see the data on that system the same way it would if it were looking at HDFS. In fact, it’s as simple as using a different URI string to connect and you don’t have to remove your HDFS DAS if you don’t want to. Simply take your existing Hadoop cluster and point to viprfs://accesspoint.yourcompany.com like shown below. Hadoop will automatically open a series of connections to access the data at the fastest possible rate.
Now before we wanted to go out and tell the world about this solution, we really wanted to enlist the support of the Hadoop distributions and we wanted to test it thoroughly. The picture below is a setup of 10 racks of ECS running the Hortonworks and Pivotal distributions of Hadoop. This is one of others like it that seek to simplify the implementation process, validate all things are functional, and provides us a place to test scenarios our customers bring to us.
Our friends at Hortonworks really did an amazing job going through all of the features of Hadoop and validating each and every line of Apache code works on ECS. Click here to see all of the certifications that have already been completed with our geo-scale object platform and Hortonworks.
So what? What does this mean to you? Let’s get serious and clear. Never before has there been an opportunity to purchase your own Analytics-Ready-Cloud-in-a-Box. So who are the customers that might care?
If you have a need for data to be spread across geographies such as Americas, Europe, and Asia; or even New York, Chicago, and Los Angeles, then relying on a single name space to support that environment while keeping the data in a state that can be quickly accessed and analyzed should be top of mind. Thus far, we’ve seen customers in the following segments (to name a few and not exhaustive):
Let’s compare this with the existing technologies used in Public Cloud providers not using ECS. Data is collected in multi-tenant object systems, is copied to another platform for analysis (a Cloud Data Lake so to speak) and the results pushed back into your primary system. Amazon’s S3 and EMR are a good example of that type of legacy cloud architecture. With ECS, we remove the need to move data by allowing analysis to happen against the data set where it sits. Now that’s Revolutionary!
If you have requirements that you believe are met with ECS, whether you want to host the equipment yourself or are looking for an ECS-enabled Public Cloud Service Provider, reach out to your EMC representative or discuss with our friends at Hortonworks. We can meet your needs with this rEvolutionary architecture.
For more information, you can watch this video of my colleagues Nikhil & Priya discussing the internals of the platform and how it works with Hadoop.
You can also download our Hadoop on ECS White Paper.