The Hortonworks Blog

Posts categorized by : Apache Hadoop

With the release of Apache Hadoop YARN in October of last year, more and more solution providers are moving from single-application Hadoop clusters to a versatile, integrated Hadoop 2 data platform. This allows them to host multiple applications — eliminating silos, maximizing resources and bringing true multi-workload capabilities to Hadoop. 

That is why we’re  extremely excited to have Paul Kent, Vice President of Big Data at SAS, share his insights on the value of Apache Hadoop YARN and the benefits it brings to SAS and its users. …

Few industries depend as heavily on data as financial services. Insurance companies, retail and investment banks aggregate, price and distribute capital with the aim of increasing their return on assets with an acceptable level of risk.

To do that, financial decision-makers need data. Apache Hadoop helps them store new data sources, then process the larger combined dataset for batch, interactive and real-time analysis. More data and better analysis improves bottom-line results.…

This week we continue our YARN webinar series with detailed introduction and a developer overview of Apache Tez.  Designed to express fit-to-purpose data processing logic, Tez enables batch and interactive data processing applications spanning TB to PB scale datasets.  Tez offers a customizable execution architecture that allows developers to express complex computations as dataflow graphs and allows for dynamic performance optimizations based on real information about the data and the resources required to process it.…

We are in the midst of a data revolution. Hadoop, powered by Apache Hadoop YARN, enables enterprises to store, process, and innovate around data at a scale never seen before making security a critical consideration. Enterprises are looking for a comprehensive approach to security for their data to realize the full potential of the Hadoop platform unleashed by YARN, the architectural center and the data operating system of Hadoop 2.

Hortonworks and the open community continue to work tirelessly to enhance security in Hadoop.…

The world’s top telecommunications firms adopt Hadoop to gain competitive advantage and to respond to technology-driven changes like increases in both network traffic and the telemetry data captured by network sensors.

The majority of North America’s and Europe’s telcos have chosen Hortonworks Data Platform (HDP) to meet these challenges. Read the new Hortonworks white paper for a detailed discussion of twenty-one common telco and cable company use cases.

Download the White Paper

With their Modern Data Architectures based on HDP, these firms improve efficiency and capture opportunities in some of these ways:

  • Analyze call detail records (CDRs).

ScaleOut joined the Hortonworks Technology Partner Program and has recently achieved Hortonworks Certified status for ScaleOut hServer. ScaleOut Software is a pioneer in in-memory data grid software and the ScaleOut hServer can be installed directly on Hadoop nodes and runs in-memory. In this guest blog, William Bain, Founder and CEO, talks about certification and a use case.

Recently, ScaleOut Software announced technical certification of its ScaleOut hServer® product on Hortonworks Data Platform 2.1.…

This is a guest blog from Protegrity, a Hortonworks certified partner.

As Hadoop transitions to take on a more mission critical role within the data center, so the top IT imperatives of process innovation, operational efficiency, and data security naturally follow. One such imperative in particular now tops the requirement list for Hadoop consideration within the enterprise: a well-developed framework to secure data.

The open source community has responded. Work is underway to build out a comprehensive and coordinated security framework for Hadoop that can work well with existing IT security investments.…

Introduction

HDP 2.1 ships with Apache Knox 0.4.0. This release of Apache Knox supports WebHDFS, WebHCAT, Oozie, Hive, and HBase REST APIs.

Hive is a popular component used for SQL access to Hadoop, and the Hive Server 2 with Thrift supports JDBC access over HTTP. The following steps show the configuration to enable a JDBC client to talk to Hive Server 2 via Knox (Beeline > JDBC over HTTPS > Knox > HTTP > Hive Server2).…

In May, Hortonworks acquired XA Secure and made a promise to contribute this technology to the Apache Software Foundation.  In June, we made it available for all to download and use from our website and today we are proud to announce this technology officially lives on as Apache Argus, an incubator project within the ASF.

This podling has been formed and now the process of graduating Argus to a top-level project (TLP) has begun.…

Hortonworks Software Engineers Vinod Kumar Vavilapalli (Apache Hadoop YARN committer) and Jian He (Apache YARN Hadoop committer) discuss Apache Hadoop YARN’s Resource Manager resiliency upon restart in this blog.This is their third blog post in our series on motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager (RM) resiliency. Others in the series are:

Introduction Phase II – Preserving work-in-progress of running applications

ResourceManager-restart is a critical feature that allows YARN applications to be able to continue functioning even when the ResourceManager (RM) crash-reboots due to various reasons.…

HP and Hortonworks recently announced a strategic partnership that included a $50 million equity investment by HP. While the investment is important, there is an equally important joint commitment to help accelerate the adoption of Enterprise Apache Hadoop by deeply integrating the Hortonworks Data Platform (HDP) with the HP HAVEn big data platform.

Below are some thoughts on our joint work from the HP OMi Team…

The first area of joint engineering strategy between our companies will be to integrate Apache Ambari with HP Operations Manager i (OMi) which provides tools and APIs to provision, manage and monitor Hadoop clusters.  …

“Data is to information society what fuel was to the industrial economy: the critical resource powering the innovations that people rely on,” write Victor Mayer-Schönberger and Kenneth Cukier, in Big Data. Today, big data fuels and engenders innovation of new products and services, according to Forrester.

Just as countries’ fuel repositories need protection and security because they can come under attack, so do companies’ big data repositories. “Companies, markets, and countries are increasingly under attack from cyber-criminals.…

It’s been a busy year for Apache Ambari. Keeping up with the rapid innovation in the open community certainly is exciting. We’ve already seen six releases this year to maintain a steady drumbeat of new features and usability guardrails. We have also seen some exciting announcements of new folks jumping into the Ambari community.

With all these releases and community activities, let’s take a break to talk about how the broader Hadoop community is affecting Ambari and how this is influencing what you will see from Ambari in the future.…

Apache Hadoop has come along a long way. From its early days as a platform to index the web, it has evolved to its current interactive, real-time, and batch processing capabilities spanning gigabytes to petabytes of content. A key stepping stone in this evolution has been Apache Hadoop YARN. YARN has enabled enterprises to onboard “fit for purpose” processing engines to its Hadoop Data Lake. This has opened the Data Lake to rapid and unbridled innovation by the ISV community and delivered differentiated insight to the enterprise.…

SequenceIQ provides an API and platform to build predictive applications and turn data into tangible assets. In this guest blog, SequenceIQ Co-founder and CTO Janos Matyas (@sequenceiq), explains why his team chose Apache Ambari for provisioning Hadoop clusters and how they contributed to the Ambari project.

At SequenceIQ, we frequently provision Hadoop clusters on different environments. For a long time, we searched for the right provisioning and management tool.…

Go to page:« First...23456...102030...Last »