Category Archives: Business Analytics


The importance of data accuracy for Hadoop banking analytics

Many financial institutions have expressed doubt over the accuracy of their collected information.

Business analytics solutions, such as those built upon Hadoop architecture, can be a major resource for members of the financial industry. With these tools at their disposal, banks leadership could gain major insights into their operations and market places as well as improving their efforts to effectively engage potential clients.

However, these organizations need access to accurate customer data to gain the full benefits of business analytics solutions. According to a recent Experian QAS survey, many financial institutions have struggled to ensure the accuracy of the data they gather, Credit Union Journal reported. Ninety-one percent of the organizations that participated in the survey suspected that the information they collected was inaccurate in some fashion. While respondents reported that as much as 18 percent of their data could not be ensured for accuracy on average, 27 percent of the total number of participating enterprises could not say how much of their information was compromised.…

Read Full Article »

3 steps to making Hadoop data aerodynamic

Hadoop big data is the aerodynamic option for analytics.

Comparing Hadoop big data analytics to an aerodynamic vehicle produces a fairly apt parallel – both are modern concepts that harness lots of information and figure out how to concentrate it for optimal results. Big data can produce a lot of figurative weight and drag if insights aren't directed with the right focus, and the friction caused by retrieval lags can torpedo organizational growth. Here are three ways to make big data analytics soar.

1) Conquer the air
This step might sound silly, but it comes in the spirit of believing that the sheer amount of available information can be conquered. Big data arrives constantly and from various sources, continuously regenerating and offering new perspectives. According to DotNetNuke CEO Navin Nagiah, data streams will continue to get more crowded, but that doesn't have to mean analytics efforts must become cloudier. On the contrary, having the data and being able to use it will be paramount to success. 

"In the business world, it is the company that has the data that has the power," wrote Nagiah.…

Read Full Article »

Improving movie scripts with data analytics

Big data tools can help Hollywood producers craft movies that are more likely to engage an audience.

Over the past few decades, Hollywood producers have increasingly become dependent on blockbuster movies to generate the revenue streams they require to rationalize investing millions of dollars into a single film, with blockbuster movies like "The Avengers" and "The Dark Knight Rises" forming the backbone of the filmmaking industry for some time now. According to the Economist, successful releases aimed at a wide audience were in large part responsible for box-office revenues reaching a record $10.8 billion in 2012. Individual movies are raking in more money than ever before. The recently released "Iron Man 3" has already earned $175.3 million after the first three days of its North American release, CNN reported. Last year's top earner, "The Avengers," made $623.4 million in the United States alone.

The rising costs of film development
However, the cost of funding these projects has skyrocketed as well. For example, "Iron Man 3" cost Disney approximately $200 million, according to CNN.…

Read Full Article »

Reduce travel stress with big data

Big data solutions aim to take stress out of traveling.

Recent developments in big data applications use information about consumer habits and demographic trends to develop stress-reducing solutions for travelers. Travel management company CWT recently developed the Travel Stress Index, an algorithm-based tool that they hope will translate information into actionable recommendations, reported InformationWeek’s Ellis Booker.

“We had the transactional data and we had some traveler profile data, but it was scattered and we had to bring them together,” Catalin Ciobanu, CWT big data analyst, told InformationWeek.

Harnessing big data for pointed results involved a multi-faceted approach. They gave surveys to over 7,000 travelers from a variety of backgrounds, who were asked to rank 33 activities on a 1-10 scale, according to the stress they generated. Three categories of stress-inducers came to the forefront: lost time (like not being able to work in locations without wireless or planes without access), surprise (like lost luggage) and interruptions of daily routines (like altered sleeping schedules or a lack of healthy foods).…

Read Full Article »

4 Reasons to use Hadoop for Data Science

Over the last 10 years or so, large web companies such as Google, Yahoo!, Amazon and Facebook have successfully applied large scale machine learning algorithms over big data sets, creating innovative data products such as online advertising systems and recommendation engines.

Apache Hadoop is quickly becoming a central store for big data in the enterprise, and thus is a natural platform with which enterprise IT can now apply data science to a variety of business problems such as product recommendation, fraud detection, and sentiment analysis.

Building on the patterns of Refine, Explore, Enrich that we described in our Hadoop Patterns of Use whitepaper, let’s review some of the major reasons to use Hadoop for data science which are also capture in the following presentation:

 

Reason 1: Data exploration with full datasets

Data scientists love their working environment. Whether using R, SAS, Matlab or Python, they always need a laptop with lots of memory to analyze data and  build models. In the world of big data, laptop memory is never enough, and sometimes not even close.

A common approach is to use a sample of the large dataset, a large a sample as can fit in memory. With Hadoop, you can now run many exploratory data analysis tasks on full datasets, without sampling. Just write a map-reduce job, PIG or HIVE script, launch it directly on Hadoop over the full dataset, and get the results right back to your laptop.

Reason 2: Mining larger datasets

In many cases, machine-learning algorithms achieve better results when they have more data to learn from, particularly for techniques such as clustering, outlier detection and product recommenders.

Historically, large datasets were not available or too expensive to acquire and store, and so machine-learning practitioners had to find innovative ways to improve models with rather limited datasets. With Hadoop as a platform that provides linearly scalable storage and processing power, you can now store ALL of the data in RAW format, and use the full dataset to build better, more accurate models.

Reason 3: Large scale pre-processing of raw data

As many data scientists will tell you, 80% of data science work is typically with data acquisition, transformation, cleanup and feature extraction. This “pre-processing” step transforms the raw data into a format consumable by the machine-learning algorithm, typically in a form of a feature matrix.

Hadoop is an ideal platform for implementing this sort of pre-processing efficiently and in a distributed manner over large datasets, using map-reduce or tools like PIG, HIVE, and scripting languages like Python. For example, if your application involves text processing, it is often needed to represent data in word-vector format using TFIDF, which involves counting word frequencies over large corpus of documents, ideal for a batch map-reduce job.

Similarly, if your application requires joining large tables with billions of rows to create feature vectors for each data object, HIVE or PIG are very useful and efficient for this task.

Reason 4: Data agility

It is often mentioned that Hadoop is “schema on read”, as opposed to most traditional RDBMS systems which require a strict schema definition before any data can be ingeted into them.

“Schema on read” creates “data agility”: when a new data field is needed, one is not required to go through a lengthy project of schema redesign and database migration in production, which can last months. The positive impact ripples through an organization and very quickly everyone wants to use Hadoop for their project, to achieve the same level of agility, and gain competitive advantage for their business and product line.

If you want to learn more about data science with Apache Hadoop, you can Get Started over here and also we invite you to attend Hortonwork’s “Applying data science with Apache Hadoop” classes:

Improving hotel marketing and performance in stages with Hadoop and big data

Managing an international, multi-brand hotel group involves tracking an enormous number of metrics, providing an opportunity to use Hadoop and big data to improve both operational and executive-level business performance.

Managing an international, multi-brand hotel group involves tracking an enormous number of metrics, providing an opportunity to use Hadoop and big data to improve both operational and executive-level business performance. For InterContinental Hotels Group (IHG), which manages 4,602 hotels worldwide and registers 150 million room night sales a year, the use of data analytics has improved targeted marketing and bottom line results.

"Service-oriented data-driven organizations such the InterContinental Hotel Group are a great example that big data is helping any industry become better, more efficient and drive better customer experiences," Smart Data Collective contributor and big data executive Mark van Rijmenam wrote in a recent post.

In a video for AllAnalytics explaining his company's approach to becoming a data-driven organization, David Schmitt, IHG's director of performance strategy and planning, explained the transition the brand has undergone. IHG began by tracking operational performance to help improve in-the-field tactical decision making and marketing before expanding to enterprise analytics to evaluate how to build brands in specific countries, for instance.…

Read Full Article »

Hadoop analytics helps streamline manufacturing processes

Manufacturing settings are one of the most promising openings for Hadoop big data analytics, as a flood of sensor data and inputs from other automated processes can help provide insights on everything from potential fabrication errors to assembly line inefficiencies.

Manufacturing settings are one of the most promising openings for Hadoop big data analytics, as a flood of sensor data and inputs from other automated processes can help provide insights on everything from potential fabrication errors to assembly line inefficiencies. A recent EE Times article profiled some of the ways manufacturers such as AMD and Samsung have implemented Hadoop platforms to create workflow improvements.

"In the past, raw manufacturing data from sensors was merely streamed to a passive data warehouse, which was only consulted by employees when problems arose – to determine what caused the problem – or to produce off-line reports about the efficiency of past manufacturing runs," EE Times contributor R. Colin Johnson explained.

Using a Hadoop implementation, AMD has been able to cut the work of employees checking semiconductor wafer quality data by 90 percent by catching faulty product batches earlier in production.…

Read Full Article »