Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
August 15, 2017
prev slideNext slide

Get Started Quickly with Data Science in the Cloud ASAP!

Data Science

So you might ask, Why run Data Science in the Cloud?

Just as Apache Hadoop allows data scientists to work with any type of data, both Hadoop and the cloud make it possible to work with any amount of data by providing ready access to unlimited storage and capacity. Hadoop allows the use of commoditized hardware, and the cloud provides agility, allowing the data scientist to scale both compute and storage as needed.

No longer constrained by their own personal environment, data scientists can now train and refine predictive models using all the data available to the enterprise—and just in time. As big data reshapes businesses and industries, the ability to build, train and work with accurate predictive models has become a primary driver of competitive advantage.

The cloud offers more than just scalability. Data scientists can self-provision all the storage and compute they need quickly and efficiently, define their own parameters, get the environment up and running quickly, and then shut it all down once they’re finished. Previously, they had to ask IT to build a data science infrastructure.

The ability to easily spin up and spin down capacity gives researchers the freedom to perform experiments and test hypotheses at will, without depending on IT, to accelerate their work. Masking backend complexity, the cloud reduces the friction of setting up a Hadoop cluster with the necessarily tools. Multiple versions of Apache Spark and other machine learning frameworks can be added just in time, then released when work is complete. A pay-as-you-go model allows the organization to shift costs from capital to operating budgets, providing a clearer picture of big data ROI.

The Hortonworks Solution For Hadoop And Spark In The Cloud

To help data scientists take full advantage of the ideal stack for data science, Hortonworks provides a solution that brings together big data and unlimited compute to make it simple to launch a cluster in the cloud. HDCloud (Hortonworks Data Cloud for AWS) delivers the most popular capabilities of Hortonworks Data Platform including Hadoop, Spark, Apache Hive 2 with LLAP and Apache Zeppelin within an easy-to-use product instance available on-demand within an organization’s existing AWS account.

Designed for optimal ease of use by developers and data scientists, HDCloud eliminates the need to sort through infinite configuration options by providing a set of prescriptive cluster types optimized and pre-tuned for ephemeral workloads.  The ability to spin up workload clusters for the most common use cases— including data science and exploration, lets users start modeling and analyzing data sets in minutes without the need for IT assistance.

How to get started quickly?  We built a toolkit to help you get started with data science in the cloud. You will find guides, whitepapers, webinar replays, etc. all aimed at helping you become an expert. Click here to get started ASAP.

Comments

  • Thank you for sharing this toolkit on taking big data to cloud. I got a lot of useful info about this topic from your whitepapers. I can’t wait to try it on my own. Please keep on sharing more helpful posts on big data in the future.

  • Indeed, HDInsight is simple to setup. A PoC cluster can be setup in 30mins or less. Did it in early 2016. My impression was that it was more of a data processing platform rather than specific to data science. Roni, would you tell us the latest developments on the data science in HDInsight please. Does it support PySpark and ML libraries?

    In 2016, I was impressed with Azure’s offering on Machine Learning with the easy and speed of starting data science on Clouds. I believe Amazon has similar offering called, AML.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *