Our guest blogger today is Sean Anderson, Manager of Data Service at Rackspace, the managed cloud company. Sean will share with us all the work Rackspace is doing with Hortonworks Data Platform (HDP) for an an Enterprise-ready Hadoop solution.
Rackspace is excited to be joining the open source data platform community for Hadoop Summit 2015 hosted by Hortonworks and Yahoo. We partnered with Hortonworks in 2013 to build two platforms—one that delivers enterprise-ready Hadoop on-demand in the cloud, and another that delivers customizable and secure dedicated servers backed by fanatical support and expertise. Since the inception of both we have seen the rapid adoption of the Hortonworks Data Platform along with many users transitioning their projects into high-functioning production systems.
One of the distinct advantages of working with a partner like Rackspace is that we work with customers to help map their use case to the best Hadoop ecosystem tools and corresponding technologies to be successful from architecture guidance to ongoing optimization efforts. As Melanie Posey of IDC points out; “As enterprises develop next-generation applications that leverage mobile, social, and analytics technologies, IT departments must match the use case to the broad array of open-source database options”. Rackspace and Hortonworks are committed to helping users grasp the advances in Hadoop technology and adopt new tools with little disruption to the business.
The robust nature of the ecosystem and the forward momentum in this space is highlighted best at focal industry events like Hadoop Summit in San Jose, California. We will join distribution providers, analytics vendors, data transformation vendors, and Hadoop practitioners to share best practices and stories from the front lines in the fast moving world of big data.
Rackspace will be there to answer questions and engage with the community to hear how we can better meet the needs of this rapidly changing ecosystem and understand how to best deliver enterprise-ready Hadoop to users with all the tools and features they need to meet the demands of their use case. This year, Rackspace is proud to launch our support for Apache projects and Hortonworks platform tools Storm and Kafka on our Cloud Big Data Platform, OnMetal Big Data Platform, and Managed Big Data Platforms.
The combination of Apache Storm and Apache Kafka represents the tools needed to implement popular capabilities like in-memory processing and a robust pub-sub service to accommodate that operation. In addition, Storm and Kafka combine to make workloads like the processing of incoming streaming data sources not only possible but easy by providing a single destination for the incoming data feed.
Apache StormTM is a real-time processing engine for Hadoop based of allocated resources from YARN. It excels at taking the operative batch nature of map-reduce and applying a near-real-time computation model to meet the demanding nature of a high velocity data source. Storm was an ideated project by BackType, a company that Twitter purchased in 2011. The synergy was palpable as Twitter is a leading producer of high velocity sentiment streams. Twitter open-sourced the project and it moved into Apache incubation. Apache Storm has been compared and contrasted to the in-memory technology Spark Streaming which is also a supported Hadoop ecosystem tool. There are many great articles that compare and contrast Spark and Storm and point out when you would want to use one vs. the other. You can read the Infoworld article here. To give a short and dirty synopsis while both are extremely fast at processing support multiple languages, and scale very well; Storm presents some unique advantages being purpose built for streaming including the use of topology mapping. Storm uses a concept of spouts and bolts and many of the popular spouts are pre-configured and available out-of-the-box. It’s tie ins to Apache Kafka also help make the streaming endpoint simple which allows for an easy workflow between streamed sources and a distributed system like Hadoop.
Apache KafkaTM is a highly scalable publish-subscribe messaging system. It works with Storm to handle incoming data streams and ensure that data is being captured, processed, and removed in a way that accommodates continual operation. Kafka is one of our most commonly added additions to the popular Cloud Big Data services and enables persistent ingestion and processing of things like social platforms, machine sources, and other data streams.
At Rackspace, we manage Hadoop for our users to help them focus on their data and business requirements. Our partnership with Hortonworks ensures that the level of expertise brought to the management layer is second to none. We will manage and monitor not only the hosted infrastructure but many components of the Hadoop application. As we continue to bring on projects we aim to help our users adopt the new tools and understand how if effects their business goals. Sometimes these tool additions shift the ideal underlying infrastructure. That is why it is essential that users have the flexibility and partnership with a company that can help them plan, architect, and upgrade to the new versions and tooling that arise. To learn more about big data solutions powered by Hortonworks please visit our website.