Hortonworks on Apache Hadoop


Hadoop Summit Europe 2013 Reveals Strong Ecosystem Support

Hadoop Summit Europe 2013, the European extension of the original and world’s largest Apache Hadoop community conference, today announced its official program, featuring a keynote address from 451 Group Analyst and Research Manager for Data Management and Analytics Matt Aslett and 40 use cases and educational sessions from leading industry and community experts. In addition, Hadoop Summit Europe 2013 boasts an impressive list of Platinum, Gold and Silver sponsors, demonstrating ecosystem support for Apache Hadoop from leading producers of software and services for the enterprise.

Hadoop Summit Europe will be the first and largest European conference focused exclusively on accelerating the enterprise adoption of Apache Hadoop, held at the historic Beurs van Berlage in Amsterdam on March 20-21, 2013. The event features sponsors ranging from traditional software companies to open source analytics vendors, confirming strong European interest in Hadoop.

Registration for Hadoop Summit Europe 2013 remains open, however, the conference is filling up fast.…

Read More

Imperative and Declarative Hadoop: TPC-H in Pig and Hive

According to the Transaction Processing Council, TPC-H is:

The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.

TPC-H was implemented for Hive in HIVE-600 and for Pig in PIG-2397 by Hortonworks intern Jie Li. In going over this work, I was struck by how it outlined differences between Pig and SQL.

There seems to be tendency for simple SQL to provide greater clarity than Pig. At some point as the TPC-H queries become more demanding, complex SQL seems to have less clarity than the comparable Pig.…

Read More

The Hadoop Ecosystem: Big Data Analytics Meets Advertising (Webinar)

Please join Hortonworks, Impetus and Entravision/Luminar for a webinar on how big data analytics is being used in the advertising industry to identify predictability models of consumer behavior. The webinar will take place on Tuesday, February 12th at 1pm (EST), 10am (PST).

Register Now

Big data analytics is becoming increasingly useful to professionals in digital media, gaming, healthcare, security, finance and government, and nearly every industry you can name. Companies are analyzing vast amounts of data from various sources to shed light on customer behaviors, accelerate lead conversion, pinpoint security threats and enrich social media marketing efforts. In fact, new tools and technologies are making it easier to harness the power of Big Data and put it to use, and businesses are quickly uncovering valuable insights that were previously unavailable.

Entravision Communications Corporation is one company looking to reap the benefit of big data through careful analytics.…

Read More

The Hadoop Ecosystem: Bigger Data on Your Budget (Webinar)

Please join Hortonworks and Appnovation for a webinar titled “Bigger Data on Your Budget” taking place on Wednesday, February 13th at 2pm EST, 11am PST.

Register Now

Appnovation is a new Hortonworks Systems Integrator partner that is focused on cutting edge open source technologies. They are experts in Drupal, Alfresco, SproutCore and now Apache Hadoop.

In advance of this webinar, I interviewed Dave Porter, Appnovation & SproutCore Lead Developer, about the technologies they support and how Appnovation and Hortonworks are working together to provide big insights without breaking the bank.

Question: In your opinion, what are the best technologies to combine with Apache Hadoop?

Dave: Any stack is going to require a place to store your Hadoop insights, a way to get at that data (say, as a web API), and a way to view the data. My favorite stack is Hadoop for processing and storage, node.js for the web API, and SproutCore for the rich, data-driven sophistication that it brings to web application development.…

Read More

Doing More with the Hortonworks Sandbox

The Hortonworks Sandbox was recently introduced garnering incredibly positive response and feedback. We are as excited as you, and gratified that our goal providing the fastest onramp to Apache Hadoop has come to fruition. By providing a free, integrated learning environment along with a personal Hadoop environment, we are helping you gain those big data skills faster. Because of your feedback and demand for new tutorials, we are accelerating the release schedule for upcoming tutorials. We will continue to announce new tutorials via the Hortonworks blog, opt-in email and Twitter (@hortonworks).

When the new tutorials are ready, the update process is a simple with one click of a button. Simply go to the “About Hortonworks Sandbox” icon, and press the Update button. Your initial Sandbox virtual machine installation will remain and only the tutorials will be updated.

 

One of the other requests you had is to have access to more interesting datasets, for you to experiment more with the Sandbox.…

Read More

Apache HBase Region Splitting and Merging

For this post, we take a technical deep-dive into one of the core areas of HBase. Specifically, we will look at how Apache HBase distributes load through regions, and manages region splitting. HBase stores rows of data in tables. Tables are split into chunks of rows called “regions”. Those regions are distributed across the cluster, hosted and made available to client processes by the RegionServer process. A region is a continuous range within the key space, meaning all rows in the table that sort between the region’s start key and end key are stored in the same region. Regions are non-overlapping, i.e. a single row key belongs to exactly one region at any point in time. A region is only served by a single region server at any point in time, which is how HBase guarantees strong consistency within a single row#.…

Read More

The Hadoop Ecosystem: Unleashing the Marketing Potential of Big Data

The customer data that companies collect from websites, social media, blogs, digital advertising and mobile is exploding. And as big data gets bigger, the amount of untapped insights available from analyzing that day is also growing exponentially. Marketers covet those insights as a way to better understand and engage with their customers and ultimately drive revenue—but how do they get to it?

According to Gartner, organization that successfully integrate high-value, diverse new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20 percent.* Fortunately, a new solution that combines Hortonworks Data Platform (HDP) with the expertise of eSage Group allows marketing professionals to extract value from Big Data, quickly and with relative ease.

We interviewed eSage’s Dean Bedard, COO, about how the combination helps marketers unleash the power of Big Data and put it to use:

Q.…

Read More

Hortonworks Achieves Quality Assurance and Certification for Rackspace Private Cloud

Today we announced Hortonworks Data Platform certification for Rackspace Private Cloud. In fact, we are the only Apache Hadoop distribution certified with Rackspace Private Cloud. The result of combining the power of enterprise-class Apache Hadoop in Hortonworks Data Platform (HDP) with Rackspace Private Cloud, is that organizations now have a secure, scalable environment to refine, explore and enrich their data using Hadoop in the cloud. With HDP, data can be processed from applications that are hosted on Rackspace Private Cloud environments, allowing you to quickly and easily obtain additional business insights from this information. The provisioning, monitoring and management components of HDP are important enablers for the integration with the Rackspace Private Cloud, providing an easy path for getting data into and out of the cloud.…

Read More

Hortonworks Joins OpenStack Foundation

By contributing to the OpenStack ecosystem, Hortonworks is supporting the open source community and facilitating adoption of 100-percent open source Apache Hadoop-based solutions in the cloud.  Now customers will be able to access an enterprise-ready Hortonworks Data Platform built for the cloud that alleviates the time and complexities of manually deploying a big data solution.…

Read More

The Road Ahead for Hortonworks and Hadoop

I recently delivered a webinar entitled “Hortonworks State of the Union”. For those new to Apache Hadoop, I covered a brief history of Hadoop and Hortonworks’ role within the open source community. We also covered how the platform services, data services, and operational services required to enable Hadoop as an enterprise-viable platform evolved in 2012.

Finally, we discussed the important progress made on deeply integrating Hadoop within next-generation data architectures in a way that makes sense for the enterprise. Our partnership with Teradata provides a great example of how deep integration of BOTH the data services (via Apache HCatalog) AND the operational services (via Apache Ambari’s REST APIs) can deliver value in a way that addresses mainstream enterprise needs while preserving existing investments.

What’s next?

If 2012 was a big year for Hadoop and big data, then 2013 should be HUGE.…

Read More

DataFu: The WD-40 of Big Data

If Pig is the “duct tape for big data“, then DataFu is the WD-40. Or something.

No, seriously, DataFu is a collection of Pig UDFs for data analysis on Hadoop. DataFu includes routines for common statistics tasks (e.g., median, variance), PageRank, set operations, and bag operations.

It’s helpful to understand the history of the library. Over the years, we developed several routines that were used across LinkedIn and were thrown together into an internal package we affectionately called “littlepiggy.” The unfortunate part, and this is true of many such efforts, is that the UDFs were ill-documented, ill-organized, and easily got broken when someone made a change. Along came PigUnit, which allowed UDF testing, so we spent the time to clean up these routines by adding documentation and rigorous unit tests. From this “datafoo” package, we thought this would help the community at large, and there you have DataFu.…

Read More

Hortonworks Sandbox — the Fastest On Ramp to Apache Hadoop

Go from Zero to Big Data in 15 Minutes!

Today Hortonworks announced the availability of the Hortonworks Sandbox, an easy-to-use, flexible and comprehensive learning environment that will provide you with fastest on-ramp to learning and exploring enterprise Apache Hadoop.

The Hortonworks Sandbox is:

  • A free download
  • A complete, self contained virtual machine with Apache Hadoop pre-configured
  • A personal, portable and standalone Hadoop environment
  • A set of hands-on, step-by-step tutorials that allow you to learn and explore Hadoop on your own

The Hortonworks Sandbox is designed to help close the gap between people wanting to learn and evaluate Hadoop, and the complexities of spinning up an evaluation cluster of Hadoop. The Hortonworks Sandbox provides a powerful combination of hands-on, step-by-step tutorials paired with an easy to use Web interface designed to lower the learning curve for people who just want to explore and evaluate Hadoop, as quickly as possible.…

Read More

Don’t be Tardy for This Hadoop BINGO Party!

Happy New Year, everyone!

I’m excited to kick-off our first webinar series for 2013: The True Value of Apache Hadoop.

Get all your friends, co-workers together and be prepared to geek out to Hadoop!

This 4-part series will have a mixture of amazing guest speakers covering topics such as Hortonworks 2013 vision and roadmaps for Apache Hadoop and Big Data, What’s new with Hortonworks Data Platform v1.2, How Luminar (an Entravision company) adopted Apache Hadoop, and use case on Hadoop, R and GoogleVis. This series will provide organizations an opportunity to gain a better understanding of Apache Hadoop and Big Data landscape and practical guidance on how to leverage Hadoop as part of your Big Data strategy.

How is that a party?

We’re going to incorporate a game of BINGO! That’s right folks/potential attendees/registrants, a game of B-I-N-G-O for this webinar series.…

Read More

Hadoop in Perspective: Systems for Scientific Computing

When the term scientific computing comes up in a conversation it’s usually just the occasional science geek who shows signs of recognition. But although most people have little or no knowledge of the field’s existence, it has been around since the second half of the twentieth century and has played an increasingly important role in many technological and scientific developments. Internet search engines, DNA analysis, weather forecasting, seismic analysis, renewable energy, and aircraft modeling are just a small number of examples where scientific computing is nowadays indispensible.

Apache Hadoop is a newcomer in scientific computing, and is welcomed as a great new addition to already existing systems. In this post I mean to give an introduction to systems for scientific computing, and I make an attempt at giving Hadoop a place in this picture. I start by discussing arguably the most important concept in scientific computing: parallel computing; what is it, how does it work, and what tools are available?…

Read More

“State of the Union” Webinar Features Hortonworks Executive Delivering 2012 Year-in-Review, Mapping Out Strategic Direction for 2013 and Highlighting Key Product Offerings

What:             “Hortonworks State of the Union and Vision for Apache Hadoop in 2013” webinar

Who:               Shaun Connolly, Vice President of Corporate Strategy, Hortonworks

When:             Tuesday, January 22, 2013 at 1:00 p.m. ET/10:00am PT

Where:           http://info.hortonworks.com/Winterwebinarseries_TheTrueValueofHadoop.html

Click to Tweet: #Hortonworks hosting “State of the Union” webinar to discuss 2013 vision for #Hadoop, 1/22 at 1 pm ET. Register here: http://bit.ly/VYJxKX

The “State of the Union” webinar is the first in a four-part Hortonworks webinar series titled, “The True Value of Apache Hadoop,” designed to inform attendees of key trends, future roadmaps, best practices and the tools necessary for the successful enterprise adoption of Apache Hadoop.

During the “State of the Union,” Connolly will look at key company highlights from 2012, including the release of the Hortonworks Data Platform (HDP)—the industry’s online 100-percent open source platform powered by Apache Hadoop—and the further development of the Hadoop ecosystem through  partnerships with  leading software vendors, such as Microsoft and Teradata.…

Read More

Go to page:« First...45678...Last »