The Hortonworks Blog

Hot on the heels of the release of the new version of Sandbox, I thought it would be worth a look at Ambari as it is now integrated into the Sandbox VM. You can download the Hortonworks Sandbox and try it out for yourself!

Apache Ambari is a web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters. It greatly simplifies and reduces the complexity of running Apache Hadoop. Ambari is a fully open-source, Apache project and graphical interface to Hadoop.…

We are excited to tell you about the newest release of the Hortonworks Sandbox.

The Hortonworks Sandbox provides the fastest onramp to Apache Hadoop with an easy-to-use, integrated learning environment and a functional personal Hadoop environment. The Sandbox takes the complexity out of Hadoop installation and set up by providing a fully functional virtual image. If you are evaluating Apache Hadoop or need an easy way to prove out use cases then the Sandbox is for you.…

This post co-authored by Arun Murthy.

It’s been an exciting time for the Apache Hadoop community with new and innovative projects happening around performance (Apache Tez) — part of the Stinger initiative — and security (Apache Knox). In addition Hortonworks recently announced the availability of the beta version of Hortonworks Data Platform for Windows.

One of the things we believe strongly in here at Hortonworks is community driven open source and, obviously, the bigger the community, the better.…

Installing the Hortonworks Data Platform for Windows couldn’t be easier. Lets take a look at how to install a one node cluster on your Windows Server 2012 machine. // to let us know if you’d like more content like this.

/centerTo start, download the HDP for Windows MSI at http://hortonworks.com/products/hdp-windows/#install/. It is about 460MB, and will take a moment to download. Documentation for the download is available here.…

Jaspersoft, a Hortonworks certified technology partner, recently completed a survey on the early use of Apache Hadoop in the enterprise. The company found 38% of respondents require real-time or near real-time analytics for their Big Data with Hadoop. Also, within the enterprise, there is a diverse group of people who use Hadoop for such insights: 63% are application developers, 15% are BI report developers and 10% are BI admins or casual business users.…

We are just under two weeks away from start of the first ever Hadoop Summit Europe and with all of the final preparations being made we thought we would highlight some of the not to be missed activities in and around the event. The event is filling fast but you can still register here.

Here are 10 great reasons to attend!

1)   Great track content – there are 35 informative sessions on Apache Hadoop and related technologies for you to choose from selected by the community and delivered by the experts themselves.…

There have been many Apache Hadoop-related announcements the past few weeks, making it difficult to separate the signal from the marketing noise. One thing is crystal clear however… there is a large and growing appetite for Enterprise Hadoop because it helps unlock new insights and business opportunities in a way that was not previously technologically or economically feasible.

Enterprise and Open Source are NOT Mutually Exclusive

Dan Woods from Forbes, recently penned an article entitled “Why SQL Matters, the Limits of Open Source, and Other Lessons of EMC Greenplum’s Pivotal HD” where he paints a picture of enterprise and open source in opposite corners.…

 

For several years now Apache Hadoop has been fueling the fast growing big data market and has become the defacto platform for Big Data deployments and the technology foundation for an explosion of new analytic applications. Many organizations turn to Hadoop to help tame the vast amounts of new data they are collecting but in order to do so with Hadoop they have had to use servers running the Linux operating system.…

Cat Miller is an engineer at Mortar Data, a Hadoop-as-a-service provider, and creator of mortar, an open source framework for data processing.

Introduction

For anyone who came of programming age before cloud computing burst its way into the technology scene, data analysis has long been synonymous with SQL. A slightly awkward, declarative language whose production can more resemble logic puzzle solving than coding, SQL and the relational databases it builds on have been the pervasive standard for how to deal with data.…

Introduction

Hortonworks hosted the second Apache Hadoop YARN meetup at Hortonworks office in Palo Alto on last Friday (22 February 2013). Following the success with the first one, this meetup continues to enjoy a good attendance from the YARN community. About 40 joined the meetup in person and nearly another 30 attended via phone/webex.

Meetup sessions Update from Yahoo!

The Yahoo! grid team responsible for YARN rollout on their clusters gave an update of the current deployments and their state.…

 

In Derrick Harris’ article on GigaOM entitled “EMC to Hadoop competition: See ya, wouldn’t wanna be ya.”, EMC unveiled their new Pivotal HD offering which effectively re-architects the Greenplum analytic database so it sits on top of the Hadoop Distributed File System (HDFS). Scott Yara, Greenplum cofounder, is excited about the new product. Since a key focus for us at Hortonworks is to deeply integrate Hadoop with other data systems (a la our efforts with Teradata, Microsoft, MarkLogic, and others), I’m always excited to see data system providers like Greenplum decide to store their data natively in HDFS.…

Apache Pig version 0.11 was released last week. An Apache Pig blog post summarized the release. New features include:

  • A DateTime datatype, documentation here.
  • A RANK function, documentation here.
  • A CUBE operator, documentation here.
  • Groovy UDFs, documentation here.

And many improvements. Oink it up for Pig 0.11! Hortonworks’ Daniel Dai gave a talk on Pig 0.11 at Strata NY, check it out:…

Last week, the HBase community released 0.94.5, which is the most stable release of HBase so far. The release includes 76 jira issues resolved, with 61 bug fixes, 8 improvements, and 2 new features.

Most of the bug fixes went against the REST server, replication, region assignment, secure client, flaky unit tests, 0.92 compatibility and various stability improvements. Some of the interesting patches in this release are: [HBASE-3996] – Support multiple tables and scanners as input to the mapper in map/reduce jobs [HBASE-5416] – Improve performance of scans with some kind of filters.…

We are now less than a month away from the kickoff of Hadoop Summit Europe, taking place March 20-21 in Amsterdam. The excitement from the community is really starting to grow and pass sales are far ahead of where we expected. Much of the buzz is tied directly to the content that will be presented during the conference.

In all, there were be 42 breakout sessions with presenters from more than 20 companies, including representatives from Adobe, eBay, Facebook, HSBC, LinkedIn, Twitter and Yahoo!.…

YARN is part of the next generation Hadoop cluster compute environment. It creates a generic and flexible resource management framework to administer the compute resources in a Hadoop cluster. The YARN application framework allows multiple applications to negotiate resources for themselves and perform their application specific computations on a shared cluster. Thus, resource allocation lies at the heart of YARN.

YARN ultimately opens up Hadoop to additional compute frameworks, like Tez, so that an application can optimize compute for their specific requirements.…

Go to page:« First...1020...2829303132...40...Last »