Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
November 18, 2014
prev slideNext slide

Starting Small and Scaling Big with Apache Hadoop: Sandbox to Production

Big data continues to dominate the discussion as businesses both big and small seek to make sense of what exactly it is, and more importantly, what they should do about it. The three biggest challenges associated with big data investments include determining how to get value from data, defining the big data strategy, and obtaining the skills and capabilities needed to make sense of it in a meaningful way.

Join our webinar Thursday Nov. 20: Learn how to scale your infrastructure–today, tomorrow and for the future–and drive invaluable business impact.

  • How to grow a pilot into production
  • How to scale-out architecture & systems affordably
  • How to leverage the flexibility of Hadoop to optimize your data integration processes

The webinar will be about scale, from small to big Hadoop projects, but some are just beginning their journey. The best way to get started with Hadoop is with the Hortonworks Sandbox, designed to help users better understand Hadoop, build a proof of concept, and test new functionality before heading into production.

Talend Big Data Sandbox

Talend has extended the Sandbox to include their development tools as well some really nice pre-canned demos that visually bring to life some well know big data use-cases. the Talend Big Data Sandbox is a pre-configured virtual environment designed to quickly and easily launch big data projects through practical use cases and interactive learning tools. The Talend Big Data Sandbox is delivered as a Virtual Machine (VM) that includes the Hortonworks Sandbox along with Talend Platform for Big Data that’s configured and ready to run. Together, these components allow you to discover the power of Hadoop, Hortonworks Data Platform, and Talend’s graphical Eclipse-based data integration tools—so you can process, manage, and analyze your data without the need for complex programming to generate optimized code.

What can you do in the Talend Big Data Sandbox?

Once in the Talend Big Data Sandbox, you have a number of ways you can explore and test data integration and analysis capabilities—all of which will help to dramatically accelerate your learning curve. Several real-world use cases, in conjunction with valuable documentation and video tutorials, allow you to easily walk through examples related to:

  • ETL offloading. If you’re tired of dealing with the time-consuming challenges inherent in processing large volumes of third-party data, you’ll appreciate how the Talend Big Data Sandbox can help you streamline the task. You’ll learn how to offload ETL overhead to Hadoop and HDFS, so you can optimize your data warehouse and realize business value faster.
  • Clickstream analysis. Clickstream data lets you better understand user behavior on your website, such as which products people are browsing, and how they navigate to and from product pages. You can use the sandbox to practice loading this data to HDFS and then use a Talend MapReduce job to calculate results for a Google Chart, Tableau Report, or similar Hive-connected analytic tool.
  • Twitter hashtag sentiment analysis. Social media can provide tremendous insight into customer behavior. In the Talend Big Data Sandbox, you can focus on Tweets using a particular #hashtag value for a set period of time and then analyze them based on their positive or negative sentiments and geolocations. This gives you valuable practice with processes such as Ingesting, Formatting, Standardizing, Enriching and Analyzing Tweets within Hadoop (HDFS or other storage + MapReduce computation).
  • Apache weblog analysis.. Analyzing web traffic is critical to understanding user patterns—but it can be tedious and difficult to build the necessary processes to capture billions of records. The sandbox shows you how to simplify this endeavor by filtering log records first to remove the “noise” before processing and aggregating data.

Learn More


Leave a Reply

Your email address will not be published. Required fields are marked *