From the Dev Team

Follow the latest developments from our technical team

Hadoop Summit Content Curation

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. I’ve selected a few sessions below for Hadoop system administrators and dev-ops, curating them under a general Hadoop operations theme.

Dev-ops engineers and system administrators know best that ease of operations and deployments can make or break a large Hadoop production cluster, which is why they care about all of the following:

  • how rapidly they can create or replicate a cluster;
  • how efficiently they can manage or monitor at scale;
  • how easily and programmatically they can extend or customize their operational scripts; and
  • how accurately they can foresee, forestall, or forecast resource starvation or capacity stipulation.

The Apache Storm community recently announced the release of Apache Storm 0.9.2, which includes improvements to Storm’s user interface and an overhaul of its netty-based transport.

We thank all who have contributed to Storm – whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Together, we resolved 112 JIRA issues.

Here are summaries of this version’s important fixes and improvements.

New Feature Highlights Netty Transport Overhaul

Storm’s Netty-based transport has been overhauled to significantly improve performance through better utilization of thread, CPU, and network resources, particularly in cases where message sizes are small.…

Last Thursday we hosted the last of our seven Discover HDP 2.1 webinars, Using Apache Ambari to Manage Hadoop Clusters. Over 140 people attended and joined in the conversation.

The speakers gave an overview of Apache Ambari, discussed new features, and showed an end-to-end demo.

Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Jeff Sposetti (Hortonworks’ Senior Director of Product Management), and Mahadev Konar (Hortonworks’ Co-founder, Committer, and PMC Member for Apache Hadoop, Apache Ambari, and Apache Zookeeper) who presented the webinar.…

This week we hosted a webinar entitled HDP Advanced Security: Comprehensive Security for Enterprise Hadoop. Over 135 people attended, prompting an informative discourse and a series of questions.

The speakers outlined the HDP Advanced Security features and benefits in Hortonworks Data Platform and gave a demo. Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Balaji Ganesan (Hortonworks’ Senior Director, Enterprise Security Strategy), and Don Bosco Durai (Hortonworks’ Enterprise Security Architect).…

Today, we announce certification of Apache Spark as YARN Ready. This certification ensures memory and CPU intensive Spark-based applications can co-exist within a single Hadoop cluster with all the other workloads you have deployed. Together, they allow you to use a single cluster with a single set of data for multiple purposes rather than silo your Spark workloads into a separate cluster.

Two months ago, we announced the acquisition of XA Secure. and at that time we stated that the software would be generally available by the end of June. We are happy to announce that we have delivered as promised and the solution is available for download for everyone today. Also, if you are an HDP Enterprise Plus Subscription customer, additional support for these new functions is now provided.

HDP Advanced Security expands on the solid security features already found in HDP to provide central administration and coordinated enforcement of enterprise security policy for a Hadoop cluster.…

We recently hosted the sixth of our seven Discover HDP 2.1 webinars, entitled Apache Storm for Stream Data Processing in Hadoop. Over 200 people attended the webinar and joined in the conversation.

Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Himanshu Bari (Hortonworks’ Senior Product Manager for Storm), and Taylor Goetz (Hortonworks’ Software Engineer and Apache Storm Committer) who presented the webinar. The speakers covered:

  • Why use Apache Storm?

It has been an exciting  few weeks for the XA Secure team. We formally joined Hortonworks on 5/15 and have received a warm  welcome from our new peers. Even more exciting are the numerous discussions we have had with current customers and prospects on how we can bring together a comprehensive and holistic security capabilities to HDP.  We now begin the journey to incubate our XA Secure functionality as a completely open source project governed by the Apache Software Foundation.…

We recently hosted the fifth of our seven Discover HDP 2.1 webinars, entitled Apache Solr for Hadoop Search. Over 200 people attended the webinar, prompting an informative discourse.

The speakers outlined the Apache Solr overview and features, followed by a practical demo of how to process, index, search, and visualize server log data.

Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Rohit Bakhshi (Hortonworks’ senior product manager), and Paul Codding (Hortonworks’ Solution Engineer) who presented the webinar.…

Enterprises are using Apache Hadoop powered by YARN as a Data Operating System to run multiple workloads and use cases instead of using it just as a single purpose cluster.

A multi-purpose enterprise wide data platform often referred to as a data lake gives rise to the need for a comprehensive approach to security across the Hadoop platform and the workloads. Few weeks back Hortonworks acquired XA Secure to further execute on our vision to bring a holistic security framework to the Hadoop community irrespective of the workload.…

Apache YARN, Apache Slider, and Docker

Join us June 19 at 6 pm at the Hilton Fort Worth, Texas for an educational workshop hosted by Hortonworks and Sendero Business Services. The topic is “The Key To Success is Consistently Making Good Decisions & The Key To Good Decisions is Good Information.” The speaker is Don Hilborn, Solutions Engineer at Hortonworks.

Don will introduce the paradigm of

  • Efficiency – double processing in Hadoop on the same hardware while providing predictable performance and quality of service; and
  • Resource sharing – providing a stable common set of shared resources across multiple, coordinated workloads in Hadoop.

This is the second in the series of blogs exploring how to write data-driven applications in Java using the Cascading SDK. The series are:

  • WordCount
  • Log Parsing
  • Historically, programming languages and software frameworks have evolved in a singular direction, with a singular purpose: to achieve simplicity, hide complexity, improve developer productivity, and make coding easier. And in the process, foster innovation to the degree we have seen today—and benefited from.

    Anyone among you is “young” enough to admit writing code in microcode and assembly language?…

    With the release of Apache Hadoop YARN in October of last year, organizations are moving from single-application Hadoop clusters to a versatile, integrated Hadoop 2 data platform hosting multiple applications — eliminating silos, maximizing resources and bringing true multi-workload capabilities to Hadoop.  Many enterprises have adopted YARN as the architectural center of a set of integrated technologies and capabilities that form the blueprint for enterprise Hadoop.

    YARN Enabling the Ecosystem Technologies

    Hortonworks is making it easier to develop YARN applications through a number of technologies. …

    Introduced in 2008, Apache Hive has been the de-facto SQL solution in Hadoop. By 2012, SQL had become a key battleground for Hadoop and many vendors started to publish benchmarks showing massive performance advantages their solutions had over Hive. Each of these vendors predicted that Hive would eventually be supplanted by the proprietary solution they were pushing.

    The concerns about Hive’s performance were real. Hadoop in 2012 was a purely batch platform and no work had ever been done within Hive to address low-latency or interactive workloads.…

    A significant reason for the increased adoption of the Hortonworks Data Platform by customers and partners has been Apache Hadoop YARN. This major advance, introduced last October in HDP2, allows Hadoop to move from many single-purpose clusters to a versatile, integrated data platform that hosts multiple business applications.

    YARN has become the architectural center of Hadoop. We intend to make it easier for applications to work in a YARN environment, and benefit from the integrated capabilities and technologies that form the blueprint for enterprise Hadoop.…

    Go to page:« First...23456...10...Last »