Posts by Big Data Insights:


Using Hadoop and big data for more complete CRM insights

Brands of all types are excited by the idea of delivering more personalized customer experiences through the use of Hadoop and big data analytics, but, in reality, many are falling short by failing to integrate data from multiple sources. A recent Harvard Business Review blog post highlighted the shortcomings present in many companies' marketing efforts due to incomplete data and insufficient analytics. To overcome these barriers, companies can use tools such as Hadoop and MapReduce to make sense of information stored in customer relationship management (CRM) systems.

"Though digital channels continue to proliferate and consumers continue to distribute their time spent with a brand across this fragmented landscape, most brands are still using outdated tactics to reach the masses," wrote HBR contributor Richard Ting, executive vice president and global executive creative director of mobile and social platforms at digital agency R/GA. "…To surgically cut through the noise, advertisers need to develop richer customer profiles."

He suggested that companies that use data pulled from sources such as CRM software, social media conversation and interactions, brand interactions such as website behavior and purchase histories can better accomplish their goals. These can include targeting segments of consumers, improving their understanding of their audience for real-time marketing campaigns and increasing the long term value of each customer. The barrier to using this information, he said, is technology.

How Hadoop helps handle CRM data
In a recent Smart Data Collective post, contributor and big data executive Mark van Rijmenam similarly identified four aspects of CRM big data:

- Customer management using structured database entries
- Interaction through the use of unstructured data in the form of emails, social media posts and comments
- Analysis of structured data from online behavior, such as click-through rates
- Knowledge of the customer based on a synthesis of the other factors that can create predictive recommendations

The combination of these aspects can create massive data streams, van Rijmenam warned. He recommended using technologies such as Hadoop and MapReduce to turn this information into insight on behavior patterns and audience sentiments and to power recommendations that influence future actions.

"Customers [who] contact organizations through whatever channel want to be recognized and [served] appropriately," he wrote. "Using big data technologies to collect, store and analyze the necessary data will truly make your customer relationship management valuable and give your organization a competitive advantage."

Where is Hadoop headed?

The early years of Hadoop development focused on creating a platform that could handle analytics at a large scale, but the coming years will more likely focus on refining the environment to work at faster speeds, according to panelists at the recent Structure: Data 2013 conference. Reporting from several panels at the event, GigaOM noted that running queries on large data sets is now a manageable process supported by a wealth of applications. Developers are now setting their sights on introducing more interactivity to the Hadoop ecosystem.

Hadoop needs to move in the direction of offering fast, predictive capabilities so that it can fulfill the same role as a feature like Google's "I'm Feeling Lucky" search function, Omer Trajman of analysis applications company WibiData suggested. Users should be able to plug in queries and get smart, dynamic responses. For this to happen, companies will need "Hadoop high throughput, low latency," analytics executive Muddu Sudhakar said.

Reaching these kinds of speeds will be challenging, but having more interactivity will unlock new uses for Hadoop and big data, experts suggested. Silvius Rus, director of big data platforms for Quantcast, explained that businesses need to be able to quickly test ideas and get answers back in minutes instead of days. For instance, companies should be able to immediately determine customer service issues and respond to problems, Ashok Srivastava, chief data scientist at Verizon, said. Additionally, big data analysis should be able to help with research in cybersecurity and certain scientific fields by crowdsourcing information.

"Imagine taking your cell phone pictures and combining them with multiple millions of other cell phone pictures," he said, according to GigaOM. "That's something that can be used by scientists."

To get this type of real-time feedback, the Hadoop community will likely have to continue to develop the functionality of tools such as Hive and Pig, but applications that nudge the world toward "big data utopia" – in the words of one panel moderator – could be coming within months, Srivastava said. Coupling speed and scale, the potential for insights using Hadoop continues to grow.

Predicting brain damage with Hadoop big data

For several years now, Hadoop big data tools have provided companies with an open platform to pursue many ambitions. Although big data rose to prominence because of its ability to enhance marketing campaigns, organizations from numerous sectors have been finding new and exciting applications for the technology. Some of the most impressive developments in the data analytics field has come from the healthcare industry. Physicians have begun to leverage big data tools to diagnose patients as well as screen others for chronic illnesses. Recent developments in the field have gone one step further.

Brain injuries by the numbers
Traumatic brain injuries (TBI) are a matter of serious concern in the United States. According to data gathered by the Centers for Disease Control and Prevention, 1.7 million cases of TBI are recorded each year. In nearly half of those cases, the patient is reported to be a child under the age of 14. TBI has also been identified as a contributing factor in approximately 30 percent of the nation's injury-related deaths.

Patients at risk for TBI receive constant surveillance from brain activity monitoring equipment, but hospital staff is only notified of fluctuations in intracranial pressure if it reaches a critical level. At that point, the damage could be irreversible. 

Predicting harmful changes in brain pressure
Forbes contributor Tom Groenfeldt reported that neurologists at the UCLA Medical Center were collaborating with IBM research teams to better identify intracranial pressure changes with the help of data analytics. IBM has provided a potential solution to this problem with its InfoSphere Streams software, which is powered by Apache Hadoop technology. This analytics tool can process a range of medical information including EEK, EKG, genomics and treatment history to see health patterns that cannot be seen with traditional methods.

"We can see where early indicators are that something is not right, so the doctors can get it as early as possible," Nagui Halim, IBM's chief architect of big data, told Groenfeldt. "Nothing happens suddenly in medicine, it only seems sudden. Conditions increase risk, certain factors are stressing the system, but it has antecedents."

With that information in hand, critical intracranial pressure changes can be identified before a patient exhibits symptoms, allowing medical staff to intervene and prevent further damage. A Toronto neonatal facility has already utilized IBM's Hadoop data analytics tools. The software was reportedly able to alert medical staff to patient health problems as much as 24 hours before traditional forms of monitoring would have recognized them. The healthcare industry is quickly realizing that big data can save lives.

How Facebook uses Hadoop and Hive

Social media giant Facebook is one of Hadoop and big data's biggest champions, and it claims to operate the largest single Hadoop Distributed Filesystem (HDFS) cluster anywhere, with more than 100 petabytes of disk space in a single system as of July 2012. The site stores more than 250 billion photos, with 350 million new ones uploaded every day, Jay Parikh, the company's vice president of infrastructure, told InformationWeek in a recent interview. He explained that the social network must use a number of tools – among them Hadoop, Hive and HBase – to manage its user information and effectively run its business.

According to Parikh, Hadoop is used in every Facebook product and in a variety of ways. User actions such as a "like" or a status update are stored in a highly distributed, customized MySQL database, but applications such as Facebook Messaging run on top of HBase, Hadoop's NoSQL database framework. All messages sent on desktop or mobile are persisted to HBase. Additionally, the company uses Hadoop and Hive to generate reports for third-party developers and advertisers who need to track the success of their applications or campaigns.

"All of those analytics are driven off of Hadoop, HDFS, Hive and interfaces that we've developed for developers, internal data scientists, product managers and external advertisers," Parikh said.

Creating faster queries
Hive, the data warehousing infrastructure Facebook helped develop to run on top of Hadoop, is central to meeting the company's reporting needs. Facebook must balance the need for rapid results in features such as its graph tools with simplicity and ease of reporting, so it is working on another contribution to Hive that will improve the speed of queries. Improving Hive's speed is important, as the scalability that makes the tool central to the social network's needs can come at the expense of low latency.

"Hive is still a workhorse and it will remain the workhorse for a long time because it's easy and it scales," Parikh said. "Easy is the key thing when you want lots of people to be able to engage with a tool. Hive is very simple to use, so we've been focused on performance to make it even more effective."

For companies just embarking on a big data initiative, striking a balance between handling technology challenges with Hadoop and deriving insight from data will be difficult but important, Parikh said. Businesses will need to experiment and maintain a constant focus on long-term goals to ensure they build out technology in the right way. However, with constant innovations in the open source Apache Hadoop community, businesses have more resources than ever to make data central to their operations in the same way as social media giants.

Big data at the heart of modern media success

Powered by tools such as Hadoop, the media companies of the future will analyze data to deliver relevant content to audiences, according to Patrick Vogt, president of the international division for The Weather Company. In an interview with The Guardian, Vogt explained that his company's properties – which include The Weather Channel and weather.com – lean on big data analysis to power both internal and customer-facing insights.

Weather information is one of the most plentiful sources of data available, and TWC processes many terabytes daily to generate thousands of local forecasts, deliver targeted alerts to consumers and companies and even to choose which stories are profiled on its website, Vogt explained. While using resources such as Hadoop and big data is relevant to helping deliver weather predictions specifically, analytics technology is also at the center of any media company's efforts to engage with audiences.

"It's not about media … it's about the data, analyzing the data and delivering relevant, engaging content to consumers," Vogt said. "The business models that focus on the consumer, content (in our case data and science), and how your company can deliver that valuable content to the consumer are the ones that will continue to innovate in the media space and beyond."

In a separate Guardian column, Jon Baron, CEO of marketing optimization firm TagMan, explained that reaching audiences requires marketers to take advantage of the trail of data people leave as the browse the internet. It is time for marketers to move beyond contemplating this data and actually make use of the tools that enable them to use it to adjust their approaches. The arrival of technology such as Hadoop clusters makes it easy to break down data pulled from many channels and turn it into something productive.

"In short, big data is the raw material that makes this invaluable foresight a possibility," Baron wrote. "Big marketing, and the tools that enable it, is the pickaxe that makes it a reality."

Big data at the heart of modern media success

Powered by tools such as Hadoop, the media companies of the future will analyze data to deliver relevant content to audiences, according to Patrick Vogt, president of the international division for The Weather Company. In an interview with The Guardian, Vogt explained that his company's properties – which include The Weather Channel and weather.com – lean on big data analysis to power both internal and customer-facing insights.

Weather information is one of the most plentiful sources of data available, and TWC processes many terabytes daily to generate thousands of local forecasts, deliver targeted alerts to consumers and companies and even to choose which stories are profiled on its website, Vogt explained. While using resources such as Hadoop and big data is relevant to helping deliver weather predictions specifically, analytics technology is also at the center of any media company's efforts to engage with audiences.

"It's not about media … it's about the data, analyzing the data and delivering relevant, engaging content to consumers," Vogt said. "The business models that focus on the consumer, content (in our case data and science), and how your company can deliver that valuable content to the consumer are the ones that will continue to innovate in the media space and beyond."

In a separate Guardian column, Jon Baron, CEO of marketing optimization firm TagMan, explained that reaching audiences requires marketers to take advantage of the trail of data people leave as the browse the internet. It is time for marketers to move beyond contemplating this data and actually make use of the tools that enable them to use it to adjust their approaches. The arrival of technology such as Hadoop clusters makes it easy to break down data pulled from many channels and turn it into something productive.

"In short, big data is the raw material that makes this invaluable foresight a possibility," Baron wrote. "Big marketing, and the tools that enable it, is the pickaxe that makes it a reality."

Hadoop future looks bright

With big data quickly picking up steam, companies are scrambling to launch their own data analytics programs. Since the very beginning, Hadoop has served as the platform of choice for data scientists looking to design their own analytics programs. Unlike proprietary alternatives, the open source platform provides an inexpensive and comprehensive resource for high-powered data processing. Recent studies have shown that, although more companies than ever before are turning toward Hadoop file systems for their big data needs, many organizations have struggled to get their programs off the ground.

According to a survey conducted by Dimensional Research, most companies' Hadoop big data projects have yet to be launched. The study found that 24 percent of data management professionals reported having a Hadoop project in full production. Half have yet to move past the planning stage. Possible reasons for these figures include the technology's relative infancy and the dearth of data scientists needed to derive actionable insights.

The amount of information being gathered and analyzed continues to grow, as 19 percent said they were managing more than 500 terabytes of data. The study also found that one of the major bonuses of a Hadoop architecture was its cost-effectiveness, as the platform allows users to scale vertically without a significant budgetary expense.

Chris Preimesberger of eWeek reasoned that companies are flocking to Hadoop because of its wide range of applications and usability. While other platforms have been designed to fit narrow needs, Hadoop offers a variety of resources that can be employed by any user. Furthermore, the platform's increasing ability to process data in real time will provide analysts with even more functionality moving forward. As Hadoop architecture becomes the standard for big data solutions, analysts can expect data centers to increasingly adopt server equipment specifically designed to harness the power of the platform. 

The far-reaching effects of big data in government

Most of the attention given to big data solutions has been toward its application in the commerce sector. Retailers, marketers and even pharmaceutical manufacturers have found success deploying data analytics programs. The potential for big data has a far greater reach than those implementations, however. With their wealth of information covering a truly massive range of subjects, federal government agencies would gain significant and meaningful insight if they began applying data analytics software to these huge datasets. Some agencies have already begun to leverage big data tools to their benefit.

Enhancing government operations
According to Wired, several governmental bodies, both within the United States and overseas, have found success using data analytics software in their operations. Within the U.S. government, the Office of the Chief Financial Officer has deployed a big data program that provides the nation's taxpayers with financial reports that can be accessed with mobile devices, allowing them to view important data at any time. The South African government, meanwhile, has applied data analytics software to information gathered through its national census program. Officials are reportedly using these tools to parse meaningful trends from the data to help guide policy decisions.

For a topic that has engendered some confusion over its exact applications, big data has actually helped some government agencies elucidate operations and directly benefit taxpayer-funded programs. For instance, the Juvenile Welfare Board in Pinellas County, Florida, has used big data software to show constituents how property taxes impact the lives of children receiving assistance from the agency.

Cracking the big data gold mine
Analyst Tim Byers explained that, by opening up their vast reserves of data, government agencies would be providing an incredible resource for further research and innovation, Forbes reported. Byers cited several past instances in which the release of information gathered by the government led to widespread application, including geographic positioning systems and weather data. By using big data and Hadoop-based software tools to analyze information gathered by government agencies, researchers could potentially find solutions to a number of issues in the country.

For instance, insights gleaned from weather, soil conditions and other environmental factors could be used to optimize agricultural production. Governmental data could also be used to aid numerous energy conservation and management initiatives, including programs to build smart cities. Data analytics teams are continuously finding new applications for the technology. With access to government-gathered data, their research could lead to monumental breakthroughs. 

Using Hadoop to move past Too Much Data

Big data analytics are frequently touted for their world-changing properties, but many users feel stymied by the complexity of turning the vast amounts of information they're currently gathering into actionable insights. Using Hadoop clusters and databases, processing the stores of unstructured data that hold the most promise can be simplified, allowing organizations to make sense of information.

According to the most recent CMO Survey, marketing executives report that just 30 percent of projects incorporate analytics, down from 37 percent a year ago, Forbes reported. At the same time, spending on big data is increasing at an average rate of 66 percent for the next three years. Contributor and Duke University professor Christine Moorman speculated that this "utilization gap" stems in part from the fact that companies are not gathering "deep, non-quantitative insights."

Additionally, organizations find it difficult to understand what customers are actually saying when analyzing textual data, she said. AdAge's Simon Dumenco agreed, noting that marketers are building "bigger haystacks" of data without being able to find the needle.

"Generally, it's pretty much Too Much Data and/or Useless Data and/or Inaccessible Data and/or Nobody Knows Quite What To Do With It Data and/or … you get the idea," he wrote.

In a recent GigaOM article speculating that just 1 percent of available data is currently being analyzed, contributor Gurjeet Singh agreed that making use of the unstructured data being gathered is at the heart of generating new insights. He suggested that too much money is being used to gather new data and not enough time is being devoted to solutions for handling it. However, tools such as Hadoop are lowering the cost of analysis while getting at the core problem of making this analysis effective. Using Hadoop big data tools to dive into NoSQL databases, organizations can turn unused, diverse sets of unstructured data such as text, video and voice into tangible insights.

How open source Hadoop distributions foster innovation

Hadoop and big data capabilities are consistently touted as one of the most significant trends in business innovation, particularly as open source development allows users to implement new features. Some of the most used applications, such as Hadoop Hive, Pig and HBase, are the result of companies and developers contributing to the project.

According to a recent post by NoSQL blogger Alex Popescu, the open source architecture is central to Hadoop's success. Although the environment can be complicated, its open endedness allows developers and data professionals to exercise some latitude in driving innovations.

"[Hadoop] allows experimenting and trying out new ideas, while continuing to accumulate and storing your data," Popescu wrote. "It removes the pressure from the developers. That's agility. It's highly appreciated."

As a result of the innovation it enables, Hadoop has become the de facto tool for handling big data, analytics executive Matt Asay wrote in a recent ReadWrite column. Although legacy software vendors are increasingly trying to offer proprietary Hadoop products, this software does not hold the same promise for driving growth. Instead, organizations should continue to pursue open source deployments that enable them to work outside the confines of a vendor's walls and potentially unlock new, more creative applications for Hadoop and big data.

Breaking down energy consumption with data analytics

The United States has an energy consumption rate that outpaces the rest of the world. According to the World Watch Institute, the average American consumes five times more energy than the average global citizen. Even more troubling is that developing nations are quickly making up ground, with China becoming the No. 1 consumer of coal in the world.

With the world facing down a potential energy crisis, many organizations have pursued virtually every avenue for methods to reduce consumption. Some studies have found that big data tools can provide invaluable insight into understanding how people use energy and what can be done to make processes more efficient.

According to GigaOM, one energy analyst startup has been using big data tools built upon a Hadoop architecture to process data gathered from more than 50 million homes. Officials from the company said it can save as much as 2 terawatt hours of energy with the information it has gleaned, saving the United States $200 million in energy costs. Their software collects data from 96 billion meter reads, processing the information and producing recommendations for reducing customer electricity use.

In Austin, Texas, energy research firm Pecan Street recently released the results of its extensive study on energy consumption within the state, according to Time magazine. The firm used data analytics tools to gather information from nearly 90 million electricity use and voltage reads per day. Researchers identified several wasteful practices contributing to the state's high energy consumption rates. Air conditioning units, for instance, accounted for 50 percent of the energy used during the state's notoriously humid summer months. Big data software found that electric heaters, however, were the most inefficient machines at consuming energy, using up more electricity than air conditioners in a given year. By scaling back on these devices, consumers can greatly reduce their wasteful energy practices.

Using big data analytics, researchers can identify the trends sending energy consumption rates soaring and provide viable solutions to the worldwide problem.