The Hortonworks Blog

It’s been a busy year for Apache Ambari. Keeping up with the rapid innovation in the open community certainly is exciting. We’ve already seen six releases this year to maintain a steady drumbeat of new features and usability guardrails. We have also seen some exciting announcements of new folks jumping into the Ambari community.

With all these releases and community activities, let’s take a break to talk about how the broader Hadoop community is affecting Ambari and how this is influencing what you will see from Ambari in the future.…

Apache Hadoop has come along a long way. From its early days as a platform to index the web, it has evolved to its current interactive, real-time, and batch processing capabilities spanning gigabytes to petabytes of content. A key stepping stone in this evolution has been Apache Hadoop YARN. YARN has enabled enterprises to onboard “fit for purpose” processing engines to its Hadoop Data Lake. This has opened the Data Lake to rapid and unbridled innovation by the ISV community and delivered differentiated insight to the enterprise.…

SequenceIQ provides an API and platform to build predictive applications and turn data into tangible assets. In this guest blog, SequenceIQ Co-founder and CTO Janos Matyas (@sequenceiq), explains why his team chose Apache Ambari for provisioning Hadoop clusters and how they contributed to the Ambari project.

At SequenceIQ, we frequently provision Hadoop clusters on different environments. For a long time, we searched for the right provisioning and management tool.…

StackIQ, a Hortonworks technology partner, offers a comprehensive software suite that automates the deployment, provisioning, and management of Big Infrastructure. In this guest blog, Anoop Rajendra (@anoop_r), a Senior Software Developer at StackIQ, gives instructions for using StackIQ Cluster Manager to deploy Apache Ambari on a cluster running Hortonworks Data Platform (HDP).

Provisioning, managing and monitoring an Apache™ Hadoop cluster can be challenging. With this in mind, the engineers at Hortonworks introduced the Apache Ambari project into the Apache Software Foundation.…

Apache Hadoop clusters grow and change with use. Maybe you used Apache Ambari to build your initial cluster with a base set of Hadoop services targeting known use cases and now you want to add other services for new use cases. Or you may just need to expand the storage and processing capacity of the cluster.

Ambari can help in both scenarios. In this blog, we’ll cover a few different ways that Ambari can help you expand your cluster.…

Earlier this month, the Apache Ambari community released Apache Ambari 1.6.1, which includes multiple improvements for performance and usability. The momentum in and around the Ambari community is unstoppable. Today we saw the Pivotal team lean in to Ambari, and this is the sixth release of this critical component in 2014, proving again that open source is the fastest path to innovation.

Many thanks to the wealth of contribution from the broad Ambari community that resulted in 585 JIRA issues being resolved in this release.…

There are many projects that have been contributed to the Apache Software Foundation (ASF) by both vendors and users alike that greatly expand Apache Hadoop’s capabilities as an enterprise data platform.

While Hadoop – with YARN at its architectural center – provides the foundational capabilities for managing and accessing data at scale, a broader blueprint for Enterprise Hadoop has emerged that specifies how this array of Apache projects fit across five distinct pillars to form a complete enterprise data platform: data access, data management, security, operations and governance.…

Today we are excited to announce a deepening of our strategic partnership with HP . This news builds on the reseller partnership that we established in 2013 enabling HP to resell the Hortonworks Data Platform. It also allows us to build on the HP AllianceOne ConvergedSystems Partner of the Year Award that we received at the recent HP Discover 2014 conference for our strategic partnership.

Given the rapid adoption of Enterprise Hadoop as a core component of a modern data architecture combined with the fact that HP is the world’s leading server vendor in terms of shipments AND revenues according to IDC – meaning a significant number of those Hadoop nodes are being deployed with HP technologies – it’s hardly surprising that we’ve been collaborating closely.…

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. We ’ve selected a few sessions for Hadoop developers, practitioners, and architects, curating them under Apache Hadoop YARN, the architectural center and the data operating system.

In most of the keynotes and tracks three themes resonated:

  • Enterprises are transitioning from traditional Hadoop to modern Hadoop 2.
  • YARN is an enabler, the central orchestrator that facilitates multiple workloads, runs multiple data engines, and supports multiple access patterns—batch, interactive, streaming, and real-time—in Apache Hadoop 2.
  • Last week, Apache Tez graduated to become a top level project within the Apache Software Foundation (ASF). This represents a major step forward for the project and is representative of its momentum that has been built by a broad community of developers from not only Hortonworks but Cloudera, Facebook, LinkedIn, Microsoft, NASA JPL, Twitter, and Yahoo as well.

    What is Apache Tez and why is it useful?

    Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets.…

    The Apache Pig community released Pig 0.13. earlier this month. Pig uses a simple scripting language to perform complex transformations on data stored in Apache Hadoop. The Pig community has been working diligently to prepare Pig to take advantage of the DAG processing capabilities in Apache Tez. We also improved usability and performance.

    This blog post summarizes the progress we’ve made.

    Support for Backends Other Than MapReduce

    We made the Pig 0.13 architecture more general to support multiple backends beyond just MapReduce, while maintaining backward compatibility.…

    As part of our YARN Ready program, we are hosting a series of technical webinars highlighting the technologies and resources available to developers for creating YARN applications. In our first webinar, “Introduction to YARN Ready,” we presented an overview of the YARN Ready program.

    To extend your technical knowledge, please join us for our first in-depth YARN Ready technology webinar, “Integrating Applications Natively to YARN” on Thursday July 24 at 9am Pacific Time.…

    Incremental Updates

    Hadoop and Hive are quickly evolving to outgrow previous limitations for integration and data access. On the near-term development roadmap, we expect to see Hive supporting full CRUD operations (Insert, Select, Update, Delete). As we wait for these advancements, there is still a need to work with the current options—OVERWRITE or APPEND— for Hive table integration.

    The OVERWRITE option requires moving the complete record set from source to Hadoop.…

    Hadoop is a business-critical data platform at many of the world’s largest enterprises. These corporations require a layered security model focusing on four aspects of security: authentication, authorization, auditing, and data protection. Hortonworks continues to innovate in each of these areas, along with other members of the Apache open source community. In this blog, we will look at the authentication layer and how we can enforce strong authentication in HDP via Kerberos.…

    Tresata, a Hortonworks Certified Technology Partner, is a next-generation predictive analytics software company that helps enterprises monetize big data™they have moved to Hadoop . In this blog, Tresata’s Director of Marketing, Katie Levans, (@katie_levans) describes the value of HDP 2.1 certification and the benefit of their solution. 

    Last month Tresata announced the release of the third generation of their hugely successful software application TREE 3.3 and its subsequent certification on HDP 2.1.…

    Go to page:« First...45678...203040...Last »