Neustar does Hadoop
Neustar was founded in 1998 for telco local number portability. In 2011 Neustar changed course and launched its telecommunications information and analytics business. Neustar used to capture 1% of its network data and retain it for 65 days. With Hortonworks Data Platform, they now capture 100% and retain it for two years. That means 150 times more storage while saving millions.
Neustar delivers two types of services helping clients promote and protect their businesses:
- Information Services and Analytics This includes: real-time identification; verification and scoring; IT/security monitoring and mitigation; Web performance optimization and marketing; Web security; and network optimization analytics.
- Data Registry Services This includes: the management of North America’s Number Portability Administration Center (NPAC); Domain Registries; Caller Name Delivery and Domain Name Services (DNS).
Launching Information & Analytics Services
In 2011, Neustar’s CEO Lisa Hook challenged her leadership team to extend their business to capture new opportunities in the burgeoning information services and analytics space. Hook knew that the immense volume of data flowing through the company’s data registries was valuable, and she saw the opportunity to optimize that value for clients in a privacy-friendly manner.
…going open source is what you need to do to energize [the next generation of engineers].
Hook challenged Neustar to continue delivering trusted, neutral network services while simultaneously creating new businesses providing data analytics and real-time information services using authoritative, accurate, permissible-use data.
At the time, Neustar’s data architecture was insufficient to meet the challenge. Because of storage costs and capacity limitations, the company was storing only 20 terabytes of its network data (about 1% of the total data available). They were only retaining that for 45 days, on a rolling basis. The Neustar team took on the challenge of capturing and storing 100% of the data and to store that for at least one year.
Mike Peterson, vice president of platforms and data architecture, knew that his team needed to accomplish three things to meet the challenge. They needed to:
- store different types of data for longer time windows,
- speed data processing workloads, and
- provide a common data platform, easily accessed by all of its data analysts.
More Storage, Less Expense
The Neustar data warehouse was well suited for near real-time and low latency workloads. However, for slower workloads that take between ten seconds and one day, Neustar was paying data warehouse vendors for speed that it did not need.
Neustar needed to keep their effective storage cost under $250 per terabyte. This would allow the team to meet the goal of retaining all of the network traffic data for at least one year, while minimizing operating costs.
Neustar’s services for DDoS attack detection and mitigation provide an excellent example of how the high cost of data storage was holding them back. Neustar SiteProtect prevents malicious traffic from affecting its clients’ Web infrastructures and defuses the largest, most complex DDoS attacks. Because Neustar was storing only a fraction of incoming DNS data, the company was unable to leverage all available attack signature data to offer the most compelling service.
Improved Data Agility
The first step in a data analysis project is to extract, transform and load (ETL) the data. Neustar data scientists need to extract relevant data from a near real-time, mission critical database in Postgres or Oracle. Then they transform and pre-parse it so they can load it into another storage environment for analysis.
To manage ETL resource requirements, Neustar typically limited data retention by developing a business need hypothesis and then retaining only the data it needed to confirm or refute that hypothesis.
In other words, the company identified the data to be retained for deeper analysis based on the hypothesis rather than on insight drawn from examining the incoming raw data.
Neustar wanted to replace this approach with a parse on demand solution, which would allow them to pre-parse data only after that data had proven value. They also needed analysis on demand. Neustar data analysts needed fast access to all of the raw data, not just a subset.
Neustar also needed more of its trusted team members to have privacy-sensitive access to the data, across more departments and functions. They also wanted the data to be available for longer, to uncover subtle, long-term trends that might not surface in one or two months of data.
Hortonworks DNA Aligned with Neustar’s Goals
Neustar began its next generation data platform project with seven core beliefs for the evolution of its technology. The Hortonworks business philosophy aligned well those core beliefs. Now in its second year with Hortonworks, Neustar has seen the tangible benefits of this alignment with these core beliefs:
- New data platforms can unlock innovation
- Although the process of adopting a new platform requires effort, that effort is rewarded with innovation and competitive advantage.
- Implementation of the platform should be a “contact sport”
- Since Neustar engineers would maintain the eventual solution, they needed to participate in the initial implementation.
- Open source technology motivates the team
- Mike Peterson put it this way, “If you’re a company that’s paying attention to the next generation of engineers, going open source is what you need to do to energize that group.”
- Rethink assumptions
- Hadoop enabled Neustar to move from limited sampling and aggregation for 45 days of data to 100% capture of detailed records for over a year. Data sources that they never dreamed of retaining are now easily ingested and queried.
- Focus data teams
- In the recent past, application exhaust data was only used within the confines of a single application and for the benefit of the clients of that application. With the implementation of Hadoop, insight is enriched through the mashing of data across applications.
- Increase technology skills
One thing we were trying to get away from was pre-packaged vendors with proprietary stuff.Implementing a new platform would increase the team’s technology skills, but only if the vendor’s professional services and support teams shared the state of the art with the company. Peterson said, “One thing we were trying to get away from was prepackaged vendors with proprietary stuff.”
- Focus on Security and Privacy
- As a trusted, neutral provider of insights and analytics derived from data, Neustar required that any solution and processes implemented facilitated the strictest security and “Privacy by Design” principles for clients and consumers.
Neustar-Hortonworks alignment has produced more than just good will. Neustar and its clients have already realized significant bottom-line benefits from adopting Hortonworks Data Platform and partnering with the Hortonworks team to make the most of the new architecture.
Millions in Cost Savings
By committing to HDP in March 2012, Neustar was able to eliminate a large license refresh fee, and to dramatically decrease annual support fees.
New Product Offerings
By moving to HDP, Neustar has met the challenge of capturing and storing 100% of the raw data flowing through its networks. Hadoop’s storage efficiency improved Neustar’s data capacity by 150 times over its architecture before Hadoop.
The team surpassed the goal of retaining one year’s data. Now they can save and retain data for up to two years in a secure, privacy compliant manner.
More data has translated to more insight. The Hadoop ecosystem enables Neustar to create products that help clients improve the effectiveness of promoting and protecting their businesses. They are now able to correlate Internet traffic patterns to consumer behavior by geography and demographics.
Data savvy product managers are thinking outside the box as more and more data is readily available in their Hadoop cluster. The newly accessible data assets are enable the company to offer its clients value added insights that were not possible before. The trend of enriching Neustar’s application data with trusted, authoritative third party data is continuing to accelerate the development of increasingly innovative services.
In the coming year, Neustar will focus on making the data in HDP available for new business opportunities. Various product groups at Neustar now understand how to securely access and analyze data in Hadoop. They will use this data to identify new business opportunities that add value for clients. Those with potential will turn into new services for clients and partners using insights derived from Neustar’s rich trove of data.
Neustar is an $800MM real-time, cloud based information services and analytics company based in Sterling, Virginia. The company’s 1,500 employees provide trusted, real-time information and analytics to marketing, Internet, communication services and entertainment companies in the U.S. and around the world. These services help Neustar clients effectively promote and protect their businesses and insure optimal network performance.
Neustar built its business providing local number portability, first for landlines and then for cellular numbers. Having almost twenty years managing complex data registries for communications networks, Neustar is a proven expert in managing the flow of data through those networks. Today, their complete services combine that data registry management expertise with best-in-class data analytics and privacy-compliant, permissible-use information services, delivered to clients in real-time, one interaction at a time (powered by their Hadoop cluster built on Hortonworks Data Platform).
- Increased storage capacity by 150x
- Captured 100% of its data, stored for two years
- Saved millions by moving workflows to Hadoop
- Launched new data services