Load & Refine Your Big Data


Big Data Refinery

Hortonworks Data Platform leverages the power of Apache Hadoop to act as your big data refinery. This a new approach for analytics. It means that you can cost-effectively store, aggregate and transform a wide range of multi-structured data sources into usable formats that allow you to ask and answer new questions and fuel new business insights.

In order to get started you will need a Hadoop cluster. You can refer to the Getting Started for Developers page in order to deploy a cluster on your own, download the Hortonworks Sandbox, or simply find an internal resource from your IT Operations group to help. Once your Hadoop cluster is deployed, you can begin to experiment with the steps below.

  1. Download & Read the Big Data Refinery White Paper

    The Big Data Refinery white paper describes how Apache Hadoop can be utilized as a big data refinery within your organization. It is targeted at technical executives and practitioners who are new to big data and who want to understand how Apache Hadoop can impact business analytics.

  2. Load Your Cluster with Data & Deploy a Process with Minimal Coding

    Hortonworks Data Platform includes a powerful set of tools for integrating legacy data and systems with Hadoop. Most notably, Talend Open Studio for Big Data, which is tightly integrated with Hortonworks Data Platform, provides a graphical environment for importing and exporting data into Hadoop. You can then quickly create Pig, Hive, HBase and HCatalog functions without having to write a single line of code.

  3. Use Apache HCatalog to Easily Integrate with Other Data Systems

    HCatalog provides a metadata service for your Hadoop cluster. Whether you use Pig, Hive, HBase or interact direct with HDFS, HCatalog can store metadata in a Hadoop cluster via SQL (or REST). It allow you to treat Hadoop as if it were a traditional database, thus allowing you to access Hadoop data using analytical tools that have familiar interfaces.

    You can learn more about the value of HCatalog from this video, presented by Alan Gates of Hortonworks. There are also many useful articles in the knowledgebase that can help you to get started with HCatalog.

  4. Explore Training Courses

    Hortonworks University offers a wide range of classes that can help you learn more about Hadoop. Public, private and online courses are available to help you learn how to develop solutions and manage Hadoop clusters. Courses consist an effective mix of interactive lecture and extensive hands-on lab exercises.

    For a complete list of our training options, please visit Hortonworks University.