Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
Hortonworks Customer
Arizona State University

We firmly believe that this data-intensive compute environment has the capacity to transform biomedicine. With our Hadoop infrastructure, we can run data-intensive queries of these large-scale resources, and they return results in seconds. This is transformational.

- Dr. Kenneth Buetow, Director of Computational Sciences and Informatics

cloud ASU Case Study


Arizona State University (ASU), (commonly referred to as ASU or Arizona State) is a public metropolitan research university located on five campuses across the Phoenix, Arizona, metropolitan area, and four regional learning centers throughout Arizona. It is the largest
university in the United States, with more than 80,000 enrolled students.

The Research Computing at ASU initiative represents a leading academic supercomputing center—providing a high-performance computing environment, a high-end data intensive ecosystem, in-memory computation required for advanced data analysis, and machine learning.

Business Challenge

The Complex Adaptive Systems Initiative (CASI) is one of ASU’s flagship programs. CASI’s research mission was to develop and promote a new type of science that embraces the complexity of natural systems.

ASU’s CASI needed to investigate how to better understand and solve the complex problem of cancer, and more specifically, liver cancer. Solutions to such complex problems require storage of massive amounts of data and also powerful data processing tools. Prior platforms for genomics cancer research limited both storage and processing, thus limiting the complexity of questions that investigators could ask and answer.

When ASU turned to Hortonworks for a genomics data lake, CASI team members needed a
connected platform to:

• Store and process huge amounts of data,
• Make that data and tools accessible to others within and outside of the university, and
• Do it all at a cost that wouldn’t escalate as genomics data grew to petabytes in their cluster.


The data in a single human genome includes approximately 20,000 genes, which if stored in a traditional platform would represent several hundred gigabytes.

Combining a specialized genomic characterization of one million individually variable DNA locations produces the equivalent of about 20 billion rows of gene-variant combinations. CASI’s Hadoop cluster holds data on thousands of individuals. Now, the CASI team uses Hortonworks Data Platform(HDP®) as a distributed infrastructure to calculate those 20 billion rows that reflect the output of CASI’s high-performance computing.

Once they’ve generated the calculations, the HDP environment lets the team seamlessly query and assemble the resulting information.


When ASU’s Research Computing department embarked on building a data-intensive environment, they teamed up to design the system according to the well-defined needs of the university’s biomedical researchers. Through HDP, the team avoided complicated machine-to-machine interconnections and wired those interconnections into the distributed framework from the very beginning.

With HDP, ASU is able to have both the availability of data and the technical capability to analyze it. The university ASU researchers rapidly comb the terabytes of cancer data to perform efficient analysis.

The HDP cluster at Arizona State University has accumulated more than a petabyte of genomic data from multiple studies involving over 500 individuals in each study. Researchers in five different teams access this genomic data lake to investigate urgent cancer research questions such as:

• Why do some people develop cancer and other people don’t?
• Why do some people respond to particular therapies while others do not?
• How can we predict who should get particular therapies?
• How do we develop next-generation therapies for those who don’t respond to the existing ones?

Access to such a huge, rich dataset, combined with highly efficient computational power has transformed the kinds of questions that ASU researchers can ask.