The Hortonworks Blog

Posts categorized by : Other

There have been many Apache Hadoop-related announcements the past few weeks, making it difficult to separate the signal from the marketing noise. One thing is crystal clear however… there is a large and growing appetite for Enterprise Hadoop because it helps unlock new insights and business opportunities in a way that was not previously technologically or economically feasible.

Enterprise and Open Source are NOT Mutually Exclusive

Dan Woods from Forbes, recently penned an article entitled “Why SQL Matters, the Limits of Open Source, and Other Lessons of EMC Greenplum’s Pivotal HD” where he paints a picture of enterprise and open source in opposite corners.…

 

In Derrick Harris’ article on GigaOM entitled “EMC to Hadoop competition: See ya, wouldn’t wanna be ya.”, EMC unveiled their new Pivotal HD offering which effectively re-architects the Greenplum analytic database so it sits on top of the Hadoop Distributed File System (HDFS). Scott Yara, Greenplum cofounder, is excited about the new product. Since a key focus for us at Hortonworks is to deeply integrate Hadoop with other data systems (a la our efforts with Teradata, Microsoft, MarkLogic, and others), I’m always excited to see data system providers like Greenplum decide to store their data natively in HDFS.…

Last week, the HBase community released 0.94.5, which is the most stable release of HBase so far. The release includes 76 jira issues resolved, with 61 bug fixes, 8 improvements, and 2 new features.

Most of the bug fixes went against the REST server, replication, region assignment, secure client, flaky unit tests, 0.92 compatibility and various stability improvements. Some of the interesting patches in this release are: [HBASE-3996] – Support multiple tables and scanners as input to the mapper in map/reduce jobs [HBASE-5416] – Improve performance of scans with some kind of filters.…

YARN is part of the next generation Hadoop cluster compute environment. It creates a generic and flexible resource management framework to administer the compute resources in a Hadoop cluster. The YARN application framework allows multiple applications to negotiate resources for themselves and perform their application specific computations on a shared cluster. Thus, resource allocation lies at the heart of YARN.

YARN ultimately opens up Hadoop to additional compute frameworks, like Tez, so that an application can optimize compute for their specific requirements.…

 

Last week, we outlined our approach for delivering an enterprise viable Apache Hadoop distribution in the open.  Simply put: we believe the fastest way to innovate is to do our work within the open source community, introduce enterprise feature requirements into that public domain, and to work diligently to progress existing open source projects and incubate new projects to meet those needs.

In support of our approach, this week we’ve announced the submission of two new incubation projects to the Apache Software foundation together with the launch of the “Stinger Initiative”, all aimed at enhancing the security and performance of Hadoop applications.  …

 

MapReduce has served us well.  For years it has been THE processing engine for Hadoop and has been the backbone upon which a huge amount of value has been created.  While it is here to stay, new paradigms are also needed in order to enable Hadoop to serve an even greater number of usage patterns.  A key and emerging example is the need for interactive query, which today is challenged by the batch-oriented nature of MapReduce. …

 

UPDATE: Since this article was posted, the Stinger initiative has continued to drive to the goal of 100x Faster Hive. You can read the latest information at http://hortonworks.com/stinger

Introduced by Facebook in 2007, Apache Hive and its HiveQL interface has become the de facto SQL interface for Hadoop.  Today, companies of all types and sizes use Hive to access Hadoop data in a familiar way and to extend value to their organization or customers either directly or though a broad ecosystem of existing BI tools that rely on this key proven interface. …

 

Back in the day, in order to secure a Hadoop cluster all you needed was a firewall that restricted network access to only authorized users. This eventually evolved into a more robust security layer in Hadoop… a layer that could augment firewall access with strong authentication. Enter Kerberos.  Around 2008, Owen O’Malley and a team of committers led this first foray into security and today, Kerberos is still the primary way to secure a Hadoop cluster.…

 

As the Release Manager for hadoop-2.x, I’m very pleased to announce the next major milestone for the Apache Hadoop community, the release of hadoop-2.0.3-alpha!

2.0 Enhancements in this Alpha Release

This release delivers significant major enhancements and stability over previous releases in hadoop-2.x series. Notably, it includes:

  • QJM for HDFS HA for NameNode (HDFS-3077) and related stability fixes to HDFS HA
  • Multi-resource scheduling (CPU and memory) for YARN (YARN-2, YARN-3 & friends)
  • YARN ResourceManager Restart (YARN-230)
  • Significant stability at scale for YARN (over 30,000 nodes and 14 million applications so far, at time of release – see more details from folks at Yahoo! 

 

At Hortonworks, our strategy is founded on the unwavering belief in the power of community driven open source software. In the spirit of openness, we think it’s important to share our perspectives around the broader context of how Apache Hadoop and Hortonworks came to be, what we are doing now, and why we believe our unique focus is good for Apache Hadoop, the ecosystem of Hadoop users, and for Hortonworks as well.…

Hadoop Summit Europe 2013, the European extension of the original and world’s largest Apache Hadoop community conference, today announced its official program, featuring a keynote address from 451 Group Analyst and Research Manager for Data Management and Analytics Matt Aslett and 40 use cases and educational sessions from leading industry and community experts. In addition, Hadoop Summit Europe 2013 boasts an impressive list of Platinum, Gold and Silver sponsors, demonstrating ecosystem support for Apache Hadoop from leading producers of software and services for the enterprise.…

The Hortonworks Sandbox was recently introduced garnering incredibly positive response and feedback. We are as excited as you, and gratified that our goal providing the fastest onramp to Apache Hadoop has come to fruition. By providing a free, integrated learning environment along with a personal Hadoop environment, we are helping you gain those big data skills faster. Because of your feedback and demand for new tutorials, we are accelerating the release schedule for upcoming tutorials.…

Today we announced Hortonworks Data Platform certification for Rackspace Private Cloud. In fact, we are the only Apache Hadoop distribution certified with Rackspace Private Cloud. The result of combining the power of enterprise-class Apache Hadoop in Hortonworks Data Platform (HDP) with Rackspace Private Cloud, is that organizations now have a secure, scalable environment to refine, explore and enrich their data using Hadoop in the cloud. With HDP, data can be processed from applications that are hosted on Rackspace Private Cloud environments, allowing you to quickly and easily obtain additional business insights from this information.…

By contributing to the OpenStack ecosystem, Hortonworks is supporting the open source community and facilitating adoption of 100-percent open source Apache Hadoop-based solutions in the cloud.  Now customers will be able to access an enterprise-ready Hortonworks Data Platform built for the cloud that alleviates the time and complexities of manually deploying a big data solution.…

I recently delivered a webinar entitled “Hortonworks State of the Union”. For those new to Apache Hadoop, I covered a brief history of Hadoop and Hortonworks’ role within the open source community. We also covered how the platform services, data services, and operational services required to enable Hadoop as an enterprise-viable platform evolved in 2012.

Finally, we discussed the important progress made on deeply integrating Hadoop within next-generation data architectures in a way that makes sense for the enterprise.…

Go to page:« First...34567...Last »