Since the partnership between Hortonworks and SAS we have created some awesome assets (i.e., SAS Data Loader sandbox tutorial, educational webinars and array of blogs) that have enabled Hadoop and Big Data enthusiasts’ hands-on training with Apache Hadoop and SAS’ powerful analytics solutions. You can find more details around our partnership and resources here: https://hortonworks.com/partner/sas
To continue the momentum, we have Paul Kent, Vice President of Big Data at SAS, share his insights on the value of YARN and the benefits it brings to SAS and its users- this time around SAS Grid and YARN.
On my travels and in the SAS Executive Briefing Center, it has become more obvious that many folks have grabbed on to the idea that Hadoop will allow them two things:
As they get closer to this goal they realize what a valuable resource the data lake has become. They need an effective means to “share nicely” – its not likely that every department is going to have the resources to establish their own data lake, and even if they do, you’ll be back to arguing about which version of the truth is the correct one.
YARN is the component in the Hadoop eco-system that helps folks share the value gained from building a shared pool of the organizations data.
As the data volumes and velocities grow it has become important to find a strategy that minimized the number of hard (permanent) copies of data (and inherent reconciliation and governance). YARN allows Hadoop to become “the Operating System for your data” – a tool that manages and mediates access to the shared pool of data, as well as the resources to manipulate the pool.
Yarn allows the various patterns of work destined for your cluster to form orderly and rational queues, so that you can set the policy for what is urgent, what is important, what is routine, and what should be allowed to soak up resources so long as no one else requires them at the moment.
Disruptive technologies like Hadoop are often deployed “at the fringes” of an organization (perhaps in an Innovation Lab). Initial ROI is often found by attacking new ground – problems the organization had not attempted to handle (or handle at scale) before. When these early projects succeed I’ve seen many customers ask themselves “well, that worked OK; is there some way to consolidate the older ways of doing things into this new world?” – Simplifying and modernizing their Analytics Landscape as a delightful side effect!
In reality the blue box for “SAS” above is really a few distinct patterns of work for the Hadoop Cluster
The first two patterns above are examples of new world Distributed Computing. The third is an example of using the newer infrastructure to replace (at a lower cost) the hardware used for a previous generation Analytics Landscape. Also SAS Grid Manager is the only product to provide horizontal scaling of an application where some parts of the application need to operate on all of the data, such as a Monte Carlo simulation. The “cherry on top” is that you can combine these technologies such that a single SAS Grid job running on a Hadoop data node could kick off an HPA job that would distribute vertically to send processing to each node to operate on the local data.
I asked Cheryl Doninger, who leads the development for SAS Grid Manager, why customers should be excited about this new flavor of SAS Grid Manager and she said – “SAS Grid Manager for Hadoop is a perfect fit for our customers who have, or plan to implement in the near future, a multi-application data operating system, as described by Arun. Now they can co-locate all of their SAS Grid jobs on the Hadoop cluster and manage them with YARN along with any other analysis being done on the cluster. The SAS Grid jobs can leverage any of the SAS integration points to Hadoop to maximize the value of this shared pool of data and all through direct integration with YARN or by leveraging other components of the Hadoop ecosystem that are natively managed by YARN.”
All this effort was a result of tightly integrated joint Engineering collaboration with Hortonworks and the Apache YARN team, including committer, Arun Murthy.
To learn more about SAS Grid Manager for Hadoop visit the SAS support site here, or click https://support.sas.com/rnd/scalability/grid/hadoop/index.html