The Hortonworks Blog

Posts categorized by : Hadoop Ecosystem
Big Data Shopping Bag

With big data basking in the limelight, it is no surprise that large retailers have been closely watching its development… and more power to them! By learning to effectively utilize big data, retailers can significantly mold the market to their advantage, making themselves more competitive and increasing the likelihood that they will come out on top as a successful retailer. Now that there are open source analytical platforms like Hadoop, which allow for unstructured data to be transformed and organized, large retailers are able to make smart business decisions using the information they collect about customers’ habits, preferences, and needs.…

We’re heading to our very first OSCON conference to talk all things Apache Hadoop, the biggest gathering for the entire open source community in Portland, Oregon, and we would love to meet you there!

Meet our founders, Arun Murthy and Mahadev Konar, along with others from the Hortonworks team at this year’s conference.

There are many ways to meet the Hortonworks team and we would love to chat with you about how you are considering using Hadoop.…

Working code examples for this post (for both Pig 0.10 and ElasticSearch 0.18.6) are available here.

ElasticSearch makes search simple. ElasticSearch is built over Lucene and provides a simple but rich JSON over HTTP query interface to search clusters of one or one hundred machies. You can get started with ElasticSearch in five minutes, and it can scale to support heavy loads in the enterprise. ElasticSearch has a Whirr Recipe, and there is even a Platform-as-a-Service provider, Bonsai.io.…

The following is Part 1 of 2 on data in education.  The first article introduces the concepts of how data is used in education.  The second article looks at recent movements by the Department of Education in data mining, modeling and learning systems.

Learning to Learn

The education industry is transforming into a 21st century data-driven enterprise.   Metrics based assessment has been a powerful force that has swept the national education community in response to widespread policy reform. …

What lessons might the anime (Japanese animation) “Ghost in the Shell” teach us about the future of big data?  The show, originally a graphic novel from creator Masamune Shirow, explores the consequences of a “hyper”-connected society so advanced one is able to download one’s consciousness temporarily into human-like android shells (hence the work’s title).  If this sounds familiar, it’s because Ghost in the Shell was a major point of inspiration for the Wachowski brothers, the creators of the  Matrix Trilogy.…

I wanted to take this opportunity to say thanks to the more than 2,200 attendees, speakers and sponsors that helped to make Hadoop Summit 2012 a great success. There was tremendous buzz throughout the conference; exceeding the excitement levels of all past Hadoop conferences. It’s a great indicator for the future of Apache Hadoop and the broader big data ecosystem.

The content from this conference was outstanding, from the opening keynotes to the last round of breakout sessions.…

What’s possible with all this data?

Data Integration is a key component of the Hadoop solution architecture. It is the first obstacle encountered once your cluster is up and running. Ok, I have a cluster… now what? Do I write a script to move the data? What is the language? Isn’t this just ETL with HDFS as another target?Well, yes…

Sure you can write custom scripts to perform a load, but that is hardly repeatable and not viable in the long term.…

Weather Hurts

Catastrophic weather events like the historic 2011 floods in Pakistan or prolonged droughts in the horn of Africa make living conditions unspeakably harsh for tens of millions of families living in these affected areas.  In the US, the winter storms of 2009-2010 and 2010-2011 brought record-setting snowfall, forcing mighty metropolises into an icy standstill. Extreme weather can profoundly impact the human kind.

The effects of extreme weather can send terrible ripples throughout an entire community. …

Big data. These are two words the world has been hearing a lot lately and it has been in relevance to a wide array of use cases in social media, government regulation, auto insurance, retail targeting, etc. The list goes on. However, a very important concept that should receive the same (if not more) recognition is the presence of big data in human genome research.

Three billion base pairs make up the DNA present in humans.…

By any measure, last week’s Hadoop Summit was a tremendous success. It brought together more than 2,200 people from throughout the Apache Hadoop ecosystem to share Hadoop knowledge, ideas, best practices, and interesting use cases. It was also a great chance for big data vendors to make announcements and demonstrate new and exciting solutions.

For those of you that missed the conference, or missed a particularly interesting presentation, we have some good news.…

The fifth annual Hadoop Summit drew to a close last week, with over 2200 Hadoopniks in attendance. While there were many innovations demonstrated, for me the best action was about Pig, HCatalog and Hive from Hortonworks and Twitter.

At the Hadoop Summit Pig Meetup, Twitter announced Ambrose, which now includes an excellent graph layout of Pig EXPLAIN data. This visualization can be used to debug and better understand your Pig scripts.…

I wanted to draw your attention to a Webinar taking place this Thursday at 1pm EDT, 10am PDT. “Back to the Future – MapReduce, Hadoop and the Data Scientist” will highlight the benefits of Apache Hadoop and the role that data scientists are playing in big data. The speakers include:

  • Colin White – Founder of BI Research, a leading research, education and consulting firm helping companies understand and benefit from evolving and leading edge technologies in the areas of business intelligence and data management.

I wanted to take this opportunity to share some important news. Today, Hortonworks announced version 1.0 of the Hortonworks Data Platform, a 100% open source data management platform based on Apache Hadoop. We believe strongly that Apache Hadoop, and therefore, Hortonworks Data Platform, will become the foundation for the next generation enterprise data architecture, helping companies to load, store, process, manage and ultimately benefit from the growing volume and variety of data entering into, and flowing throughout their organizations.…

The following press release was issued by Hortonworks today.

Hortonworks Announces General Availability of Hortonworks Data Platform

Industry’s First Apache Hadoop-based Platform to Include Management, Monitoring and Comprehensive Data Services, Making Hadoop Easy to Consume and Use in Enterprise Environments

Hadoop Summit is just around the corner and by that, I mean next week! There is still time to register for the conference but please do it soon as the conference is filling up quickly. Today is also the last day in which online registration will remain open. After today, you will need to register on-site at the conference itself.

This year’s Hadoop Summit conference, now in its fifth year, promises to be the biggest and best yet.…

Go to page:« First...89101112...Last »