Let Us Know About Your Data Science Use CaseSubmit Use Case
|Churn prediction||Predict whether a customer is likely to “leave”|
|Customer segmentation||Uncover a natural segmentation of customers into groups of similar behavior|
|Product recommendation||Predict the preference of a product to a customer, and recommend to customers those products they are most likely to have a strong preference for.|
|Information Security||Detect network traffic anomalies and identify potential hackers|
|Fraud detection||Identify fraudulent patterns in insurance claims or credit card transactions|
|Predictive Maintenance||Based on sensor data feeds, predict equipment failure before it happens and pro-actively maintain it|
Data science is a multi-step process and each step in this process requires a diverse set of skills and technologies. In other words, there is no single technology, tool or algorithm - a silver bullet - that would enable a data scientist to extract insights from all the potential data sets and diverse use cases. Data science is an iterative, multi-step process that leverages multiple tools. Let’s take a look at a typical data science workflow from a process and tools perspective.
Data science, like most software development projects, starts with with strategic planning, and addressing two important areas:
Following the planning stage, data science follows an iterative macro-process of:
Within each macro-process there is further iteration within the Data Cleansing and Data Analysis steps.
After several iterations, when the Data Scientist is satisfied with the results, he/ she then might decide to:
Hortonworks provides deep data science skills to gain industry insight from data science solutions. Hortonworks provides the following key components to deliver successful solutions:
Hortonworks continues to invest in Spark for Enterprise Hadoop so users can deploy Spark-based applications alongside other Hadoop workloads in a consistent, predictable and robust way. Current investment includes:
There are additional opportunities for Hortonworks to contribute to and maximize the value of technologies that interact with Spark. Specifically, we believe that we can further optimize data access via the new DataSources API. This should allow SparkSQL users to take full advantage of the following capabilities:
At Hortonworks we believe that Spark & HDP are Perfect Together and our focus is on:
Hortonworks’s data science team comprises of technical and thought leaders across the field. Our data scientists work closely with our customers to explore their data science requirements, define and execute projects, provide expert advice and help them overcome data science challenges. The Hortonworks data science services team works closely with our development teams, committers and the extended community to continuously drive customer requirements, improve the ecosystem and share best practices.
All the major business intelligence vendors offer Hadoop and Spark integration, and specialized analytics vendors offer niche solutions for specific data types and use cases. Since our inception, Hortonworks has been working with leading enterprise technology vendors to enable Open Enterprise Hadoop in next generation data architectures. Hortonworks has deep relationships and does co-development with a large set of partners to provide differentiated solutions. There is a rich ecosystem of partners that provide tools for the various phases data science workflow that are enabled for Hadoop and Spark on HDP. You can learn more about these partners on our Hortonworks partner page.
Hortonworks provides immersive and valuable real-world training designed by Hadoop and Spark experts. Scenario-based training courses are available in-classroom or online from anywhere in the world, and offer unmatched depth and expertise. Learn more about our HDP and data science training.