Week in Review: Hadoop Architecture, Icons and Tutorials

If you’re heading back to work today after a long hot summer then here’s some notes on last week here at Hortonworks.

Building a modern data architecture. We kicked off the week with some discussion on what it means to implement Hadoop alongside existing data architecture components. Jim covered 3 essential requirements: integration with existing systems, reuse of existing skills, enterprise requirements such as reliability and availability. We also held the first webinar in our series on implementing Hadoop in the enterprise: this one was with Teradata. You can playback here.

Moving to Hadoop 2. With the announcement that Hadoop 2.x is now Beta, then Vinod, Zhijie and Jian covered the steps to ensure compatibility of existing apps using previous MapReduce APIs and early YARN APIs, followed by the full set of YARN API changes as they’ve stabilized for the Beta.

A Fistful of Icons. If you are implementing Hadoop, then you’ll undoubtedly need solid documentation, references and so on. But you may also need a few icons to adorn your technical diagrams. We released a very simple set of Hadoop-related icons this week. As a bonus, we’ve now included stencils for Omnigraffle and Visio in the set. Download them here.

Real World Sandbox Tutorials. And finally, Cheryle covered two tutorials that are available in Sandbox on working with Clickstream data and Server log data. The latter contains a ton of useful information about the use of Flume. Finally, if you’re using Tableau, then here’s some great resources for using Tableau with HDP.

Time to dive into this week.

Categorized by :
Apache Hadoop Architecture Hadoop in the Enterprise YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Recently in the Blog

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.

Thank you for subscribing!