The Hortonworks Blog

Posts categorized by : HDP 1.x

This is the second of two posts examining the use of Hive for interaction with HBase tables. This is a hands-on exploration so the first post isn’t required reading for consuming this one. Still, it might be good context.

“Nick!” you exclaim, “that first post had too many words and I don’t care about JIRA tickets. Show me how I use this thing!”

This is post is exactly that: a concrete, end-to-end example of consuming HBase over Hive.…

Now that Hortonworks Data Platform 2.0 is GA, you may be looking to migrate your Hadoop stack from another version to take advantage of Hadoop 2’s YARN-based architecture. Fortunately, our Professional Services & Support teams are getting a lot of practice at migration from other distributions as more and more customers turn to 100% enterprise-hardened Apache Hadoop for their big data platform.

While any specific migration may have a few gotchas from a vendor lock-in, or business integration perspective, this high-level process overview is battle tested on large-scale production clusters and we hope it helps you plan for your own migration.…

With HDP 1.3 and HDP 2.0 Beta, we introduced the ability to create snapshots to protect important enterprise data sets from user or application errors.

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:

  • Performant and Reliable: Snapshot creation is atomic and instantaneous, no matter the size or depth of the directory subtree
  • Scalable: Snapshots do not create extra copies of blocks on the file system.

Syncsort, a technology partner with Hortonworks, helps organizations propel Hadoop projects with a tool that makes it easy to “Collect, Process and Distribute” data with Hadoop. This process, often called ETL (Exchange, Transform, Load), is one of the key drivers for Hadoop initiatives; but why is this technology a key enabler of Hadoop? To find out the answer we talked with Syncsort’s Director Of Strategy, Steve Totman, a 15 year veteran of data integration and warehousing, provided his perspective on Data Warehouse Staging Areas.…

The Stinger Initiative is Hortonworks’ community-facing roadmap laying out the investments Hortonworks is making to improve Hive performance 100x and evolve Hive to SQL compliance to simplify migrating SQL workloads to Hive.

We launched the Stinger Initiative along with Apache Tez to evolve Hadoop beyond its MapReduce roots into a data processing platform that satisfies the need for both interactive query AND petabyte scale processing. We believe it’s more feasible to evolve Hadoop to cover interactive needs rather than move traditional architectures into the era of big data.…

This guest post from John Haddad, Director of Product Marketing at Informatica Corporation. He has over 25 years’ experience designing, building, integrating and marketing enterprise applications. His current focus is helping organizations get the most business value from Big Data by delivering timely, trusted, and relevant data across the extended enterprise.

Why is it so important for companies today to adopt a modern data architecture and why is next generation data integration on Apache Hadoop such a critical component?…

Historical data is now an essential tool for businesses as they struggle to meet increasingly stringent regulatory requirements, manage risk and perform predictive analytics that help improve business decisions. And while recent data may be available from an enterprise data warehouse, the traditional practice of archiving old data offsite on tape makes business analytics challenging, if not impossible, because the historical information needed is simply unavailable.

Fortunately, the modern approach to data storage business analytics utilizes technologies like virtualization and big data Hadoop clusters to enable partitioned access to historical data.…

This guest post from Sofia Parfenovich, Data Scientist at Altoros Systems, a big data specialist and a Hortonworks System Integrator partner. Sofia explains she optimized a customer’s trading solution by using Hadoop (Hortonworks Data Platform) and by clustering stock data.

Automated trading solutions are widely used by investors, banks, funds, and other stock market players. These systems are based on complex mathematical algorithms and can take into account hundreds of factors.…

The Hadoop goodness just keeps on flowing as we’ve delivered new releases and new content in the past 10 days. Let’s recap.

HDP 1.3 Release. This milestone release takes advantage of improved performance in Hive 0.11 along with delivery on a series of enterprise requirements including NFS access to HDFS, improved MTTR for HBase, business continuity through HDFS and HBase snapshots, optimized connectors to Oracle and Netezza and the latest release of Ambari for management and operations.…

HDP 1.3 release delivers on community-driven innovation in Hadoop with SQL-IN-Hadoop, and continued ease of enterprise integration and business continuity features.

Almost one year ago (50 weeks to be exact) we released Hortonworks Data Platform 1.0, the first 100% open source Hadoop platform into the marketplace.  The past year has been dynamic to say the least!  However, one thing has remained constant: the steady, predictable cadence of HDP releases.  In September 2012 we released 1.1, this February gave us 1.2 and today we’re delighted to release HDP 1.3.…

We are excited to release the Hortonworks Data Platform 1.1 for Windows as a Generally Available product. In this blog post, I’m going to outline how to get started with HDP 1.1 for Windows.

With HDP for Windows, you can deploy Apache Hadoop and the HDP stack of components natively on a Windows Server cluster. The HDP for Windows download includes an MSI and remote installation scripts. With these artifacts, you can setup a multi-node Hadoop cluster in either a Workgroup or Active Directory Domain networking configuration.…

Hortonworks Data Platform 1.2 is now available for download at: http://hortonworks.com/products/hortonworksdataplatform/.

Hortonworks Data Platform (HDP) 1.2, the industry’s only complete 100-percent open source platform powered by Apache Hadoop is available today. The enterprise-grade Hortonworks Data Platform includes the latest version of Apache Ambari for comprehensive management, monitoring and provisioning of Apache Hadoop clusters. By also introducing additional new capabilities for improving security and ease of use, HDP delivers an enterprise-class distribution of Apache Hadoop that is endorsed and adopted by some of the largest vendors in the IT ecosystem.…

Hortonworks Data Platform 1.1 Brings Expanded High Availability and Streaming Data Capture, Easier Integration with Existing Tools to Improve Enterprise Reliability and Performance of Apache Hadoop

It is exactly three months to the day that Hortonworks Data Platform version 1.0 was announced. A lot has happened since that day…

  • Our distribution has been downloaded by thousands and is delivering big value to organizations throughout the world,
  • Hadoop Summit gathered over 2200 Hadoop enthusiasts into the San Jose Convention Center,
  • And, our Hortonworks team grew by leaps and bounds!

If you haven’t yet noticed, we have made Hortonworks Data Platform v1.0 available for download from our website. Previously, Hortonworks Data Platform was only available for evaluation for members of the Technology Preview Program or via our Virtual Sandbox (hosted on Amazon Web Services). Moving forward and effective immediately, Hortonworks Data Platform is available to the general public.

Hortonworks Data Platform is a 100% open source data management platform, built on Apache Hadoop.…

I wanted to take this opportunity to share some important news. Today, Hortonworks announced version 1.0 of the Hortonworks Data Platform, a 100% open source data management platform based on Apache Hadoop. We believe strongly that Apache Hadoop, and therefore, Hortonworks Data Platform, will become the foundation for the next generation enterprise data architecture, helping companies to load, store, process, manage and ultimately benefit from the growing volume and variety of data entering into, and flowing throughout their organizations.…