The Hortonworks Blog

Posts categorized by : Data Management

Haohui Mai is a member of technical staff at Hortonworks in the HDFS group and a core Hadoop committer. In this blog, he explains how to setup HTTPS for HDFS in a Hadoop cluster.

1. Introduction

The HTTP protocol is one of the most widely used protocols in the Internet. Today, Hadoop clusters exchange internal data such as file system images, the quorum journals, and the user data through the HTTP protocol.…

This summer, Hortonworks presented the Discover HDP 2.1 Webinar series. Our developers and product managers highlighted the latest innovations in Apache Hadoop and related Apache projects.

We’re grateful to the more than 1,000 attendees whose questions added rich interaction to the pre-planned presentations and demos.

For those of you that missed one of the 30-minute webinars (or those that want to review one they joined live), you can find recordings of all sessions on our What’s New in 2.1 page.…

Zettaset is a Hortonworks partner. In this guest blog, John Armstrong, VP of Marketing at Zettaset Inc., shares Zettaset’s security features and explains why data encryption is vital for data in the Hadoop infrastructure.

Comprehensive Security Across the Hadoop Infrastructure

As big data technologies like Hadoop become widely deployed in production environments, the expectation is that they will meet the enterprise requirements in data governance, operations and security while integrating with existing data center infrastructure. …

The key to monetization of Big Data is not only the ability to capture and process information quickly but also to analyze the data to derive meaningful insights.  Big Data can be complex, and often the expertise of a programmer is needed to create focused and targeted queries.

0xdata, a provider of open source machine learning and predictive analytics for Big Data, helps to facilitate the manipulation and extraction of data with the use of its H2O prediction engine for statisticians. …

The Journey

Almost to the date, two years ago the Apache Hadoop community voted to make YARN a sub-project of Apache Hadoop followed by the GA release nearly a year ago last fall.

Since then, it’s becoming plainly obvious that Apache Hadoop 2.x, powered by YARN as its architectural center, is the best platform for workloads such as Apache Hadoop MapReduce, Apache Pig, Apache Hive etc., which were designed to process data on Apache Hadoop HDFS.…

ScaleOut joined the Hortonworks Technology Partner Program and has recently achieved Hortonworks Certified status for ScaleOut hServer. ScaleOut Software is a pioneer in in-memory data grid software and the ScaleOut hServer can be installed directly on Hadoop nodes and runs in-memory. In this guest blog, William Bain, Founder and CEO, talks about certification and a use case.

Recently, ScaleOut Software announced technical certification of its ScaleOut hServer® product on Hortonworks Data Platform 2.1.…

This is a guest blog from Protegrity, a Hortonworks certified partner.

As Hadoop transitions to take on a more mission critical role within the data center, so the top IT imperatives of process innovation, operational efficiency, and data security naturally follow. One such imperative in particular now tops the requirement list for Hadoop consideration within the enterprise: a well-developed framework to secure data.

The open source community has responded. Work is underway to build out a comprehensive and coordinated security framework for Hadoop that can work well with existing IT security investments.…

This is a guest post from Hortonworks partner, Dataguise. Dataguise is a HDP 2.1 certified technology partner providing sensitive data discovery, protection and reporting in Hadoop.

According to a 2013 Global Data Breach study by the Ponemon Institute, the average cost of data loss exceeds $5.4 million per breach, and the average per person cost of lost data approaching $200 per record in the United States. That said, no industry is spared from this threat and all of our data systems, including Hadoop, need to address the security concern.…

“Data is to information society what fuel was to the industrial economy: the critical resource powering the innovations that people rely on,” write Victor Mayer-Schönberger and Kenneth Cukier, in Big Data. Today, big data fuels and engenders innovation of new products and services, according to Forrester.

Just as countries’ fuel repositories need protection and security because they can come under attack, so do companies’ big data repositories. “Companies, markets, and countries are increasingly under attack from cyber-criminals.…

Apache Hadoop has come along a long way. From its early days as a platform to index the web, it has evolved to its current interactive, real-time, and batch processing capabilities spanning gigabytes to petabytes of content. A key stepping stone in this evolution has been Apache Hadoop YARN. YARN has enabled enterprises to onboard “fit for purpose” processing engines to its Hadoop Data Lake. This has opened the Data Lake to rapid and unbridled innovation by the ISV community and delivered differentiated insight to the enterprise.…

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. We ’ve selected a few sessions for Hadoop developers, practitioners, and architects, curating them under Apache Hadoop YARN, the architectural center and the data operating system.

In most of the keynotes and tracks three themes resonated:

  • Enterprises are transitioning from traditional Hadoop to modern Hadoop 2.
  • YARN is an enabler, the central orchestrator that facilitates multiple workloads, runs multiple data engines, and supports multiple access patterns—batch, interactive, streaming, and real-time—in Apache Hadoop 2.
  • Tresata, a Hortonworks Certified Technology Partner, is a next-generation predictive analytics software company that helps enterprises monetize big data™they have moved to Hadoop . In this blog, Tresata’s Director of Marketing, Katie Levans, (@katie_levans) describes the value of HDP 2.1 certification and the benefit of their solution. 

    Last month Tresata announced the release of the third generation of their hugely successful software application TREE 3.3 and its subsequent certification on HDP 2.1.…

    Apache Cassandra is an open source NoSQL distributed database management system designed to handle large amounts of data offering a scalable real time solution that allows users to create online applications that are “always-on, no matter what.” DataStax is the company behind Cassandra, and a new Technology Partner of Hortonworks.

    Lynn Walitch leads Partner Management for DataStax and is our guest blogger today. Lynn discusses the importance of the partnership and certification with Hortonworks.…

    The Apache Storm community recently announced the release of Apache Storm 0.9.2, which includes improvements to Storm’s user interface and an overhaul of its netty-based transport.

    We thank all who have contributed to Storm – whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Together, we resolved 112 JIRA issues.

    Here are summaries of this version’s important fixes and improvements.

    New Feature Highlights Netty Transport Overhaul

    Storm’s Netty-based transport has been overhauled to significantly improve performance through better utilization of thread, CPU, and network resources, particularly in cases where message sizes are small.…

    IBM InfoSphere Guardium has certified with HDP 2.1. The  Hortonworks Certified Technology Program simplifies big data planning by providing pre-built and validated integrations between leading enterprise technologies and HDP. 

    Kathryn Zeidenstein, InfoSphere Guardium Evangelist, is our guest blogger and describes security, Hadoop, and the Guardium solution.

    Those of us in the data security and privacy space tend to worry a lot. With each new breaking story on the latest data breach, and with the subsequent fallout, people higher and higher up the food chain are also worrying a lot.…

    Go to page:123