The Hortonworks Blog

Posts categorized by : Apache Hadoop

This blog post originally appeared here and is reproduced in its entirety here. Part 1 can be found here.

The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s a little bit of guidance floating around out there, but nothing concrete.…

The Apache Tez community has voted to release 0.3 of the software.

Apache™ Tez is a replacement of MapReduce that provides a powerful framework for executing a complex topology of tasks. Tez 0.3.0 is an important release towards making the software ready for wider adoption by focussing on fundamentals and ironing out several key functions. The major action areas in this release were

  • Security. Apache Tez now works on secure Hadoop 2.x clusters using the built-in security mechanisms of the Hadoop ecosystem.
  • The Apache Software Foundation (ASF) provides valuable stewardship and guide-rails for projects interested in attracting the broadest community of involvement as possible, especially across a wide range of vendors and end users. While the ASF’s role is not about guaranteeing wild success for every project, they do a great job of providing a place where the broadest community of people, ideas, and code can come together and raise an elephant, so to speak.…

    Elasticsearch’s engine integrates with Hortonworks Data Platform 2.0 and YARN to provide real-time search and access to information in Hadoop.

    See it in action:  register for the Hortonworks and Elasticsearch webinar on March 5th 2014 at 10 am PST/1pm EST to see the demo and an outline for best practices when integrating Elasticsearch and HDP 2.0 to extract maximum insights from your data.  Click here to register for this exciting and informative webinar!…

    It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.3.0!

    hadoop-2.3.0 is the first release for the year 2014, and brings a number of enhancements to the core platform, in particular to HDFS.

    With this release, there are two significant enhancements to HDFS:

    • Support for Heterogeneous Storage Hierarchy in HDFS (HDFS-2832)
    • In-memory Cache for data resident in HDFS via Datanodes (HDFS-4949)

    With support for heterogeneous storage classes in HDFS, we now can take advantage of different storage types on the same Hadoop clusters.…

    This blog post originally appeared here and is reproduced in its entirety here.

    HBase is a distributed database built around the core concepts of an ordered write log and a log-structured merge tree. As with any database, optimized I/O is a critical concern to HBase. When possible, the priority is to not perform any I/O at all. This means that memory utilization and caching structures are of utmost importance. To this end, HBase maintains two cache structures: the “memory store” and the “block cache”.…

    With over 230 JIRA tickets resolved, the Apache HBase community released 0.98.0 yesterday which is the next major version after 0.96.x series.

    HBase 0.98.0 comes with an exciting set of new features with keeping the same stability improvements and features on top of 0.96. Additional to usual bug fixes, some of the major improvements include:

    • Reverse Scans (HBASE-4811): for use cases where both forward and reverse iteration is required, HBase now allows to perform scans in reverse mode.

    Hadoop can be a great complement to existing data warehouse platforms, such as Teradata, as it naturally helps to address two key storage challenges:

    The purpose of this article is to detail some of the key integration points and to show how data can be easily exchanged for enrichment between the two platforms.

    As a data integrator who is familiar with RDBMS systems and is new to the Hadoop platform, I was looking for a simple way (i.e.…

    Earlier this week Microsoft announced via their blog that a new version of Windows Azure HDInsight is available in public preview.

    Microsoft recognizes the importance of the technical innovation in and around YARN as well as Hortonworks leadership in this area and we have worked collaboratively to bring Hadoop 2.2 to Azure via our Hortonworks Data Platform 2.0 for Windows release.

    Apache Hadoop YARN is the data operating system for Hadoop and greatly expands the applications possible of this emerging technology by allowing multiple processing frameworks such as streaming or graph processing to plug in natively.…

    In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

    If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.…

    This guest post from Steve Ratay, Viewpoint Architect, Teradata Corporation

    Teradata’s Unified Data Architecture is a powerful combination of the Teradata Enterprise Data Warehouse, the Aster Discovery Platform, Apache Hadoop (via the Hortonworks Data Platform) and Teradata Enterprise Management tools in a single architecture. 

    If you are Teradata user managing an Enterprise Data Warehouse or Data Discovery platform, chances are that you are using Teradata Viewpoint, a monitoring and management platform for Teradata Systems. …

    I recently sat down with Mahadev Konar and Jeff Sposetti to discuss Apache Ambari v1.4.1. Ambari 1.4.1 is a single framework to provision, manage and monitor clusters based on the Hadoop 2 stack, with YARN and NameNode HA on HDFS.

    Mahadev is one of the original architects of Apache Hadoop, a co-founder of Hortonworks, and a committer on Apache Ambari and Apache ZooKeeper. Jeff is the Hortonworks product manager focused on Apache Ambari and Apache Falcon.…

    In this post, we’ll walk through the process of deploying an Apache Hadoop 2 cluster on the EC2 cloud service offered by Amazon Web Services (AWS), using Hortonworks Data Platform.

    Both EC2 and HDP offer many knobs and buttons to cater to your specific, performance, security, cost, data size, data protection and other requirements. I will not discuss most of these options in this blog as the goal is to walk through one particular path of deployment to get started.…

    Apache Accumulo is gaining momentum in markets such as government, financial services and health care for its enhanced security and performance. Hortonworks has a long history with this technology and has multiple committers to the Accumulo project on staff – at least one of whom literally helped to write the book on Accumulo. This has enabled Hortonworks to provide enterprise support for Accumulo within the Hortonworks Data Platform for some time now.…

    On Feb 8th and 9th, Hortonworks, Microsoft and Elastacloud will be hosting a hackathon at the Microsoft Campus in Mountain View, CA. Whether you’re a newbie or ninja, developer or scientist, we’d love to see you there. Register here.

    The focus of the hackathon will be city datasets. For instance, we’ll be drawing on datasets from San Francisco that will measure things like:

    • Pedestrian safety: where accidents occur, how they occur and who has caused them.
    Go to page:« First...7891011...2030...Last »