Just as Apache Hadoop allows data scientists to work with any type of data, both Hadoop and the cloud make it possible to work with any amount of data by providing ready access to unlimited storage and capacity. Hadoop allows the use of commoditized hardware, and the cloud provides agility, allowing the data scientist to scale both compute and storage as needed.
No longer constrained by their own personal environment, data scientists can now train and refine predictive models using all the data available to the enterprise—and just in time. As big data reshapes businesses and industries, the ability to build, train and work with accurate predictive models has become a primary driver of competitive advantage.
The cloud offers more than just scalability. Data scientists can self-provision all the storage and compute they need quickly and efficiently, define their own parameters, get the environment up and running quickly, and then shut it all down once they’re finished. Previously, they had to ask IT to build a data science infrastructure.
The ability to easily spin up and spin down capacity gives researchers the freedom to perform experiments and test hypotheses at will, without depending on IT, to accelerate their work. Masking backend complexity, the cloud reduces the friction of setting up a Hadoop cluster with the necessarily tools. Multiple versions of Apache Spark and other machine learning frameworks can be added just in time, then released when work is complete. A pay-as-you-go model allows the organization to shift costs from capital to operating budgets, providing a clearer picture of big data ROI.
To help data scientists take full advantage of the ideal stack for data science, Hortonworks provides a solution that brings together big data and unlimited compute to make it simple to launch a cluster in the cloud. HDCloud (Hortonworks Data Cloud for AWS) delivers the most popular capabilities of Hortonworks Data Platform including Hadoop, Spark, Apache Hive 2 with LLAP and Apache Zeppelin within an easy-to-use product instance available on-demand within an organization’s existing AWS account.
Designed for optimal ease of use by developers and data scientists, HDCloud eliminates the need to sort through infinite configuration options by providing a set of prescriptive cluster types optimized and pre-tuned for ephemeral workloads. The ability to spin up workload clusters for the most common use cases— including data science and exploration, lets users start modeling and analyzing data sets in minutes without the need for IT assistance.
How to get started quickly? We built a toolkit to help you get started with data science in the cloud. You will find guides, whitepapers, webinar replays, etc. all aimed at helping you become an expert. Click here to get started ASAP.