The Hortonworks Blog

Hadoop 2 and its YARN-based architecture has increased the interest in new engines to be run on Hadoop and one such workload is in-memory computing for machine learning and data science use cases. Apache Spark has emerged as an attractive option for this type of processing and today, we announce availability of our HDP 2.1 Tech Preview Component of Apache Spark.  This is a key addition to the platform and brings another workload supported by YARN on HDP.…

The first use of the term BoF session was used at the Digital Equipment Users’ Society (DECUS) conference in the 1960s. Its essence was to bring together like minds and thought leaders—just as birds of the feather flock together— to share and exchange computing ideas, in an informal yet spirited way. Since then, the organizers and sponsors of most computing conferences have been loyal to its essence and spirit.

For ideas and innovation happen in collaboration—not in isolation. …

This is the second in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are:

Introduction: Phase I – Preserve Application-queues

In the introductory blog, we previewed what RM Restart Phase I entails. In essence, we preserve the application-queue state into a persistent store and reread it upon RM restart, eliminating the need for users to resubmit their applications.…

This is the first post in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are:

Resource Manager (RM) is the central authority of Apache Hadoop YARN for resource management and scheduling. It is responsible for allocation of resources to applications like Hadoop MapReduce jobs, Apache TEZ DAGs, and other applications running atop YARN.…

Last week’s release of HDP 2.1 was packed with countless new features for enterprise Hadoop. These included new processing capabilities with Tez and Hive on YARN, Solr and Storm, to operations with Ambari, governance with Falcon and security with Knox.

To guide you through these capabilities, Hortonworks is hosting a new series of webinars beginning on May 8 and running to June 26.

You can join any or all of the webinars listed below, and we’ve provided a simple way of signing up for all 7.…

Hortonworks Data Platform 2.1 for Windows is the 100% open source data management platform based on Apache Hadoop and available for the Microsoft Windows Server platform. I have built a helper tool that automates the process of deploying a multi-node Hadoop cluster – utilizing the MSI available in HDP 2.1 for Windows.

Download HDP 2.1 for Windows

HDP on Windows MSI Overview

HDP on Windows installation package comes in the format of MSI, Microsoft’s MSI format utilizes the installation and configuration service provided with Windows called Windows Installer.…

The Apache Knox Gateway team is pleased to announce Knox’s first release as an Apache top-level project: Apache Knox Gateway 0.4.0. The team resolved approximately 100 JIRAs for this release and Knox Gateway is now better positioned to provide complete security for REST API access to a Hadoop cluster.

The new features in Knox Gateway 0.4.0 are the features that enterprise security officers expect in a gateway solution:

  • Perimeter security for a Hadoop cluster
  • Support for enterprise group lookup
  • Audit log of all gateway activity
  • Command line tooling for CMF provisioning
  • Protection for web application vulnerabilities
  • Pre-authentication via SSO token
  • And many more…

As a top-level project, Apache Knox Gateway is fully endorsed by the Apache Software Foundation, and this improves coordination between development of Knox and the other core Hadoop projects with which it interacts.…

Yesterday the Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, resolves more than 1,000 JIRAs.

This version of Ambari makes huge strides in simplifying the deployment, management and monitoring of large Hadoop clusters, including those running Hortonworks Data Platform 2.1.…

Three weeks ago, we announced availability of the technical preview of Hortonworks Data Platform (HDP) version 2.1 and since then we have had thousands of downloads of this preview.  We also promised delivery of GA bits on April 22nd  and we are delighted to deliver as stated. HDP 2.1, which includes countless new features across seven new components, is available today from our download page

YARN unlocks the Data Lake

YARN, the resource management layer of Hadoop 2 is delivering value as it has unlocked the data lake vision for many.…

The Apache Hive community has voted on and released version 0.13 today. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080 JIRA tickets.

Hive 0.13 also delivers the third and final phase of the Stinger Initiative, a broad community based initiative to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics.…

The power of a well-crafted speech is indisputable, for words matter—they inspire to act. And so is the power of a well-designed Software Development Kit (SDK), for high-level abstractions and logical constructs in a programming language matter—they simplify to write code.

In 2007, when Chris Wensel, the author of Cascading Java API, was evaluating Hadoop, he had a couple of prescient insights. First, he observed that finding Java developers to write Enterprise Big Data applications in MapReduce will be difficult and convincing developers to write directly to the MapReduce API was a potential blocker.…

As enterprises build new applications with the data they cost effectively capture and process with Apache Hadoop it is important for the platform to facilitate the app dev processes. That’s why we are excited to announce that we’ve expanded our partnership with Concurrent, Inc. to simplify and accelerate application development on Hadoop.

There are two components to this expanded partnership.

The Internet of Things (IoT) is in its infancy. You can buy wireless bathroom scales to upload data to monitoring tools helping you manage your weight. You can buy a connected refrigerator that keeps track of the inventory to remind you what you need to buy. It’s fascinating to think about the future of possibilities. In a recent podcast on the SAP Future of Business with Game-Changers Radio, panelist Matt Healey (Analyst at Technology Business Research) commented that he wasn’t ready for the day when his scale and refrigerator talked.…

LOOK Innovative is a new consulting partner of Hortonworks specializing in business applications of Hadoop for retail vertical market.

LOOK Innovative concentrates on delivering the complete Omni-Channel digital experience to retailers, which is the evolution of multi-channel retailing. Omni-Channel is a seamless approach for the consumer through all available shopping channels, including mobile internet devices, computers, bricks-and-mortar, television, radio, direct mail, catalog and so on. It means that consumers make buying decisions based on information from many sources and may purchase through any of those sources – they might research online but buy at the local store and may research at the store but buy online.…

The third HBaseCon is happening in May 5th this year in San Francisco which is THE community event for Apache HBase. As with the previous years, this year the agenda is quite exciting.

There will be 4 tracks, Operations, Features and Internals, Ecosystem and Case Studies. The keynotes will include speakers from Cloudera who is the event host, Google BigTable team as a follow up to their ‘06 BigTable paper, Salesforce on their experience with HBase operations and use cases and Facebook on their strongly consistent multi data center replication scheme.…

Go to page:« First...7891011...203040...Last »