The Hortonworks Blog

Posts categorized by : Apache Hadoop

The Apache Ambari community is happy to announce last week’s release of Apache Ambari 1.6.0, which includes exciting new capabilities and resolves 288 JIRA issues.  

Many thanks to all of the contributors in the Apache Ambari community for the collaboration to deliver 1.6.0, especially with Blueprints, a crucial feature that enables rapid instantiation and replication of clusters.

Each release of Ambari makes substantial strides in providing functionality to simplify the lives of system administrators and dev-ops engineers to deploy, manage, and monitor large Hadoop clusters, including those running on Hortonworks Data Platform 2.1 (HDP).…

On Wednesday May 21, Himanshu Bari (Hortonworks’ senior product manager), Venkatesh Seetharam (committer to Apache Falcon), and Justin Sears ( Hortonworks’ Product Marketing Manager), hosted the third of our seven Discover HDP 2.1 webinars. Himanshu and Venkatesh discussed data governance in Hadoop through Apache Falcon that is included in HDP 2.1. As most of you know, ingesting data into Hadoop is one thing; having data governed, by dictating and defining data-pipeline policies, is another thing—a necessity in the enterprise.…

According to New York Observer, there were couple of major social reasons that spurred the genesis and growth of Meetup.com. First, it was Robert Putman’s book Bowling Alone, in which he talks about the collapse of communities in America. And the second was an event that not only changed the world but changed New York: it was the aftermath of September 11, where strangers cared about greeting, meeting, and talking.…

On May 15, Owen O’Malley and Carter Shanklin hosted the second of our seven Discover HDP 2.1 webinars. Owen and Carter discussed the Stinger Initiative and the improvements to Apache Hive that are included in HDP 2.1:

  • Faster queries with Hive on Tez, vectorized query execution and a cost-based optimizer
  • New SQL semantics and datatypes
  • SQL-standard authorization
  • The Hive job visualizer in Apache Ambari
  • And many more

Here is the complete recording of the webinar.…

There is no doubt that Hadoop has proven value for many companies via more efficient use of resources or through new business value derived from new sets of data. However, the limited availability of trained personnel that have the necessary skills to develop and integrate with Hadoop has proven difficult for many organizations to overcome.

Please join Talend and Hortonworks on this webinar where we present an end-to-end use case across data load, processing and delivery of results for analysis of machine/sensor data without writing a line of code.…

Last week Vinay Shukla and Kevin Minder hosted the first of our seven Discover HDP 2.1 webinars. Vinay and Kevin covered three important topics related to new Apache Hadoop security features in HDP 2.1:

  • REST API security with Apache Knox Gateway
  • HDFS security with Access Control Lists (ACLs)
  • SQL security and next-generation Hive authorization

Here is the complete recording of the webinar.

Here are the presentation slides: http://www.slideshare.net/hortonworks/discoverhdp21security

Attend our next Discover HDP 2.1 webinar tomorrow, Thursday, May 15 at 10am Pacific Time: Interactive SQL Query in Hadoop with Apache Hive

We’re grateful to the many participants who joined and asked excellent questions.…

Rainstor is a Hortonworks Certified Technology Partner and provides an efficient database that reduces the cost, complexity and compliance risk of managing enterprise data. RainStor’s patented technology enables customers to cut infrastructure costs and scales anywhere; on-premise or in the cloud and natively on Hadoop. RainStor’s customers are 20 of the world’s largest communications providers and 10 of the biggest banks and financial services organizations. 

Rainstor’s Mark Cusack, Chief Architect, writes about the benefits of certification on HDP 2.1.…

Syncsort is a Hortonworks Certified Technology Partner and has over 40 years of experience helping organizations integrate big data…smarter. Keith Kohl, Director of Product Management, Syncsort, is our guest blogger. Below he talks about the importance of certification and how it benefits Syncsort’s customers and prospects interested in Hadoop.

Back in January, Syncsort announced our partnership with Hortonworks and the certification of DMX-h on HDP 2.0. I was also given the opportunity to write a guest BLOG on the Hortonworks site about HDP 2 and the GA of YARN (thanks Hortonworks!).…

I’m a pretty heavy Unix user and I tend to prefer doing things the Unix Way™, which is to say, composing many small command line oriented utilities. With composability comes power and with specialization comes simplicity. Although, sometimes if two utilities are used all the time, sometimes it makes sense for either:

  • A utility that specializes in a very common use-case
  • One utility to provide basic functionality from another utility

For example, one thing that I find myself doing a lot of is searching a directory recursively for files that contain an expression:

Despite the fact that you can do this, specialized utilities, such as ack have come up to simplify this style of querying.…

Hadoop 2 and its YARN-based architecture has increased the interest in new engines to be run on Hadoop and one such workload is in-memory computing for machine learning and data science use cases. Apache Spark has emerged as an attractive option for this type of processing and today, we announce availability of our HDP 2.1 Tech Preview Component of Apache Spark.  This is a key addition to the platform and brings another workload supported by YARN on HDP.…

The first use of the term BoF session was used at the Digital Equipment Users’ Society (DECUS) conference in the 1960s. Its essence was to bring together like minds and thought leaders—just as birds of the feather flock together— to share and exchange computing ideas, in an informal yet spirited way. Since then, the organizers and sponsors of most computing conferences have been loyal to its essence and spirit.

For ideas and innovation happen in collaboration—not in isolation. …

This is the second in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are:

Introduction: Phase I – Preserve Application-queues

In the introductory blog, we previewed what RM Restart Phase I entails. In essence, we preserve the application-queue state into a persistent store and reread it upon RM restart, eliminating the need for users to resubmit their applications.…

This is the first post in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are:

Resource Manager (RM) is the central authority of Apache Hadoop YARN for resource management and scheduling. It is responsible for allocation of resources to applications like Hadoop MapReduce jobs, Apache TEZ DAGs, and other applications running atop YARN.…

The Apache Knox Gateway team is pleased to announce Knox’s first release as an Apache top-level project: Apache Knox Gateway 0.4.0. The team resolved approximately 100 JIRAs for this release and Knox Gateway is now better positioned to provide complete security for REST API access to a Hadoop cluster.

The new features in Knox Gateway 0.4.0 are the features that enterprise security officers expect in a gateway solution:

  • Perimeter security for a Hadoop cluster
  • Support for enterprise group lookup
  • Audit log of all gateway activity
  • Command line tooling for CMF provisioning
  • Protection for web application vulnerabilities
  • Pre-authentication via SSO token
  • And many more…

As a top-level project, Apache Knox Gateway is fully endorsed by the Apache Software Foundation, and this improves coordination between development of Knox and the other core Hadoop projects with which it interacts.…

Yesterday the Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, resolves more than 1,000 JIRAs.

This version of Ambari makes huge strides in simplifying the deployment, management and monitoring of large Hadoop clusters, including those running Hortonworks Data Platform 2.1.…

Go to page:« First...23456...1020...Last »