cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

The Hortonworks Blog

More from Carter Shanklin

The need to address Business Continuity and Disaster Recovery (BCDR) concerns is well known to anyone who runs production systems. This blog introduces HBase’s new backup and restore capabilities, which give HBase the ability to perform full and incremental backups across clusters and into the cloud. When combined with real-time replication, this new incremental backup […]

The most significant new feature in Apache Hive 2, to be included in the upcoming HDP 2.5 release is a technical preview of LLAP (Live Long and Process). LLAP enables as fast as sub-second SQL analytics on Hadoop by intelligently caching data in memory with persistent servers that instantly process SQL queries. Since LLAP is […]

Are you heading to HBaseCon this year on May 24? This year HBaseCon just had too much great content to fit it all into one day, and thanks to the kind sponsorship of Salesforce we’re happy to announce that PhoenixCon, the first ever Apache Phoenix user conference will be held on the next day, May […]

Apache Ambari 2.0 User Views introduce two functional tools to help you understand and optimize your cluster resources to get the best performance in a multitenant Hadoop environment. Tez View: Understand and Optimize Jobs in your Cluster The Tez View gives you visibility into all the jobs on your cluster, allowing you to quickly identify […]

Summary This blog covers how recent developments have made it easy to use ORCFile from Cascading or Apache Crunch and that doing so can accelerate data processing more than 5x. Code samples are provided so that you can start integrating ORCFile into your Cascading or Crunch projects today. What are Cascading and Apache Crunch? Cascading […]

Introduced in 2008, Apache Hive has been the de-facto SQL solution in Hadoop. By 2012, SQL had become a key battleground for Hadoop and many vendors started to publish benchmarks showing massive performance advantages their solutions had over Hive. Each of these vendors predicted that Hive would eventually be supplanted by the proprietary solution they […]

Whether you were busy finishing up last minute Christmas shopping or just taking time off for the holidays, you might have missed that Hortonworks released the Stinger Phase 3 Technical Preview back in December. The Stinger Initiative is Hortonworks’ open roadmap to making Hive 100x faster while adding standard SQL. Here we’ll discuss 3 great […]

Update! – The final phase of improvements from the Stinger Initiative were released as part of Hive 0.13 on Apr 21, 2014 – Read the announcement While just a preview by moniker, the release marks a significant milestone in the transformation of Hadoop from a batch-oriented system to a data platform capable of interactive data […]

Security is one of the biggest topics in Hadoop right now. Historically Hadoop has been a back-end system accessed only by a few specialists, but the clear trend is for companies to put data from Hadoop clusters in the hands of analysts, marketers, product managers or call center employees whose numbers could be in the […]

In this post we’ll cover some new scheduling options available via Apache Oozie in HDP 2. You can try out these capabilities in HDP 2 Beta and HDP 2 Beta Sandbox. What Is Oozie Again? Apache Oozie is a workflow engine and scheduler for Hadoop. Oozie allows you to run jobs in Hadoop at pre-defined […]

The upcoming Hive 0.12 is set to bring some great new advancements in the storage layer in the forms of higher compression and better query performance. Higher Compression ORCFile was introduced in Hive 0.11 and offered excellent compression, delivered through a number of techniques including run-length encoding, dictionary encoding for strings and bitmap encoding. This […]

The Stinger Initiative is Hortonworks’ community-facing roadmap laying out the investments Hortonworks is making to improve Hive performance 100x and evolve Hive to SQL compliance to simplify migrating SQL workloads to Hive. We launched the Stinger Initiative along with Apache Tez to evolve Hadoop beyond its MapReduce roots into a data processing platform that satisfies […]

The Hortonworks Sandbox is a great tool for not only learning Hadoop, but also for experimentation and application development.  Deployment in a type 2 hypervisor such as Oracle VirtualBox or VMWare Workstation is straightforward and serves the need for a single user. Sandbox can also be deployed to IaaS environments, and in this case, we […]

One of the big opportunities that Hadoop provides is the processing power to unlock value in big datasets of varying types from the ‘old’ such as web clickstream and server logs, to the new such as sensor data and geolocation data. The explosion of smart phones in the consumer space (and smart devices of all […]