The Hortonworks Blog

Posts categorized by : Apache Hadoop

By now, you’re probably well aware of what Hadoop does:  low-cost processing of huge amounts of data. But more importantly, what can Hadoop do for you?

We work with many customers across many industries with many different specific data challenges, but in talking to so many customers, we are also able to see patterns emerge on certain types of data and the value that could bring to a business.

We love to share these kinds of insights, so we built a series of video tutorials covering some of those scenarios:

Some more detailed discussion of these types of data is in our ‘Business Value of Hadoop’ whitepaper.…

We are excited to announce today that Hortonworks is bringing Windows-based Hadoop Operational Management functionality via Management Packs for System Center. These management packs will enable users to deploy, manage and monitor Hortonworks Data Platform (HDP) for both Windows and Linux deployments. The new management packs for System Center will provide management and monitoring of Hadoop from a single System Center Operations Manager console, enabling customers to streamline operations and ensure quality of service levels.…

Four years ago, Arun Murthy entered a JIRA ticket (MAPREDUCE -279) that outlined a re-architecture of the original MapReduce.  In the ticket, he outlined a set of capabilities that allowed processes to better share resources and an architecture that would allow Hadoop to extend beyond batch data processing.

It turned out that this ticket was prescient of true enterprise requirements for Hadoop. As enterprise adoption accelerated, it became even clearer that multiple processing models – moving beyond batch – was critical for Hadoop to broaden its applicability for mainstream usage in the modern enterprise architecture.…

This post is from Steve Loughran, Devaraj Das & Eric Baldeschwieler.

In the last few weeks, we have been getting together a prototype, Hoya, running HBase On YARN. This is driven by a few top level use cases that we have been trying to address. Some of them are:

  • Be able to create on-demand HBase clusters easily -by and or in apps
    • With different versions of HBase potentially (for testing etc.)
  • Be able to configure different Hbase instances differently
    • For example, different configs for read/write workload instances
  • Better isolation
    • Run arbitrary co-processors in user’s private cluster
    • User will own the data that the hbase daemons create
  • MR jobs should find it simple to create (transient) HBase clusters
    • For Map-side joins where table data is all in HBase, for example
  • Elasticity of clusters for analytic / batch workload processing
    • Stop / Suspend / Resume clusters as needed
    • Expand / shrink clusters as needed
  • Be able to utilize cluster resources better
    • Run MR jobs while maintaining HBase’s low latency SLAs

The Hoya tool is a Java tool, and is currently CLI driven.…

In case you haven’t heard, Hadoop 2.0 is on the way! There are loads more new features than I can begin to enumerate, including lots of interesting enhancements to HDFS for online applications like HBase. One of the most anticipated new features is YARN, an entirely new way to think about deploying applications across your Hadoop cluster. It’s easy to think of YARN as the infrastructure necessary to turn Hadoop into a cloud-like runtime for deploying and scaling data-centric applications.…

Today Concurrent announced that we have certified the Hortonworks Data Platform  against the Cascading application framework. As Hadoop adoption continues to grow more organizations are looking to take advantage of new data types and build new applications for the enterprise. By combining our enterprise-grade data platform and unparalleled growing ecosystem with the power, maturity and broad platform support of Concurrent’s Cascading application framework, we have now closed the modeling, development and production loop for all data-oriented applications.…

Over the past year, customers have told us they want to store all their data in one place and interact with it in multiple ways… they want to use Hadoop, but in order to do so, it needs to extend beyond batch.  It also needs to be interactive and real-time (among others).

This is the entire principle behind YARN, which together with others in the community, Arun Murthy and the team at Hortonworks have been working on for more than 5 years! …

There are plenty of server and storage options for the wave of data that is being collected and analyzed.  New platforms such as Apache™ Hadoop® provide the opportunity to make all the new data types being collected useful.  However, like any other platform, performance varies depending on the underlying servers being used.  There is great promise in what Hadoop can deliver in terms of business value, and the ecosystem is continuously growing with companies making strides to make Hadoop easier to deploy and manage.…

This week we’re at the Red Hat Summit along with many others enjoying the great discussions within the community. As part of the summit, we are delighted to announce extended collaboration with Red Hat to continue to advance open source big data community projects.

Some details on the the three areas of collaboration forming the announcement:

  • Enhancing Apache Ambari to support the management of Hadoop-compatible file systems, such as GlusterFS. With this integration, users will be able to provision, deploy, monitor and manage alternative file systems with Ambari, further cementing Ambari’s position as the standard for Hadoop management.

Successful social advertising campaigns today take a special blend of data intelligence and automation – enabling businesses to link fluctuations in media and tactics to sales and revenues.  Those with better data relative to their competitors, will be positioned to outperform their peers tactically and, if used effectively, strategically.  At one of the fastest growing Advertising Technology startups, harnessing Big Data made big sense in a highly competitive business environment.

The Advertising Technology startup sells Social Ad Campaign management software and wanted its in-house engineering team to focus on its core product and to outsource certain areas of its non-core technology needs.…

Talend Open Studio for Big Data provides an intuitive set of tools that make dealing with data in the Hadoop world (and Hortonworks Data Platform in particular) a lot easier.  We often use the tools often to speed delivery of a proof of concept or to operationalize movement of data from sources like web logs and machine sensors to load HDFS.  It is simple to use and typically takes only minutes to perform something that once took hours in a script.…

The Hadoop goodness just keeps on flowing as we’ve delivered new releases and new content in the past 10 days. Let’s recap.

HDP 1.3 Release. This milestone release takes advantage of improved performance in Hive 0.11 along with delivery on a series of enterprise requirements including NFS access to HDFS, improved MTTR for HBase, business continuity through HDFS and HBase snapshots, optimized connectors to Oracle and Netezza and the latest release of Ambari for management and operations.…

HDP 1.3 release delivers on community-driven innovation in Hadoop with SQL-IN-Hadoop, and continued ease of enterprise integration and business continuity features.

Almost one year ago (50 weeks to be exact) we released Hortonworks Data Platform 1.0, the first 100% open source Hadoop platform into the marketplace.  The past year has been dynamic to say the least!  However, one thing has remained constant: the steady, predictable cadence of HDP releases.  In September 2012 we released 1.1, this February gave us 1.2 and today we’re delighted to release HDP 1.3.…

One of the goals with the Hortonworks Sandbox is around showcasing end-to-end use cases for Hadoop. With the most current release of Hadoop tutorials, you’ll find 2 specific use cases highlighted both around utilizing clickstream data.   There are 6 new tutorials for you to walk through – Tutorials 6 – 11.

(Update: if your version of Sandbox does not have “Enable Ambari” on the introductory page, you will need to download the latest version of the Sandbox in order to have access to these tutorials.)

Clickstream Analysis – Website User Behavior

 

Hadoop Tutorials in Hortonworks Sandbox

Tutorials 6-10 are extensive, step-by-step lessons to walk you through the process to connect the Sandbox to Excel 2013 via the Hortonworks ODBC driver to access and analyze semi-structured data (like Omniture logs).…

We are excited to release the Hortonworks Data Platform 1.1 for Windows as a Generally Available product. In this blog post, I’m going to outline how to get started with HDP 1.1 for Windows.

With HDP for Windows, you can deploy Apache Hadoop and the HDP stack of components natively on a Windows Server cluster. The HDP for Windows download includes an MSI and remote installation scripts. With these artifacts, you can setup a multi-node Hadoop cluster in either a Workgroup or Active Directory Domain networking configuration.…

Go to page:« First...10...1415161718...Last »