The Hortonworks Blog

Hadoop Summit Content Curation

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. I’ve selected a few sessions below for Hadoop system administrators and dev-ops, curating them under a general Hadoop operations theme.

Dev-ops engineers and system administrators know best that ease of operations and deployments can make or break a large Hadoop production cluster, which is why they care about all of the following:

  • how rapidly they can create or replicate a cluster;
  • how efficiently they can manage or monitor at scale;
  • how easily and programmatically they can extend or customize their operational scripts; and
  • how accurately they can foresee, forestall, or forecast resource starvation or capacity stipulation.

Today we are delighted to announce the formal partnership between Accenture and Hortonworks, which is the continuing evolution of the ongoing collaboration between the two companies which started in 2012. With this formal agreement, Accenture and Hortonworks will collaborate on making large structured and unstructured datasets – including operational, video and sensor data – more accessible to organizations for insight-driven decision-making. Together, the two companies will continue to collaborate on joint horizontal and vertical solutions to speed the adoption of Apache Hadoop.…

Apache Cassandra is an open source NoSQL distributed database management system designed to handle large amounts of data offering a scalable real time solution that allows users to create online applications that are “always-on, no matter what.” DataStax is the company behind Cassandra, and a new Technology Partner of Hortonworks.

Lynn Walitch leads Partner Management for DataStax and is our guest blogger today. Lynn discusses the importance of the partnership and certification with Hortonworks.…

Merv Adrian couldn’t have said it better. In his blog post from the weekend, he continued in his quest to define Hadoop. And it is no easy quest as the components of, and evolution of, Hadoop is happening at a pace that is, frankly, astounding.

The continuous evolution of Hadoop has even given rise to sentiments such as ‘Is Hadoop dead? ‘ The answer to that question is YES. And NO.  …

The Apache Storm community recently announced the release of Apache Storm 0.9.2, which includes improvements to Storm’s user interface and an overhaul of its netty-based transport.

We thank all who have contributed to Storm – whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Together, we resolved 112 JIRA issues.

Here are summaries of this version’s important fixes and improvements.

New Feature Highlights Netty Transport Overhaul

Storm’s Netty-based transport has been overhauled to significantly improve performance through better utilization of thread, CPU, and network resources, particularly in cases where message sizes are small.…

We certainly live in interesting times. About 20 months ago, in an effort to find proprietary differentiation that could be used to monetize and lock in customers to their model, Cloudera unveiled Impala and at that time Mike Olson stated “Our view is that, long-term, this will supplant Hive”. Only 6 months ago in his Impala v Hive post, Olson defended his “decision to develop Impala from the ground up as a new project, rather than improving the existing Apache Hive project” stating “Put bluntly: We chose to build Impala because Hive is the wrong architecture for real-time distributed SQL processing.”

So, 20 months after abandoning Hive and repeated marketing attempts to throw Hive and many other SQL alternatives under the bus in lieu of their “better” approach, I’m certainly puzzled as Cloudera unveils their plan to enable Apache Hive to run on Apache Spark; please see HIVE-7292 for details.…

Last Thursday we hosted the last of our seven Discover HDP 2.1 webinars, Using Apache Ambari to Manage Hadoop Clusters. Over 140 people attended and joined in the conversation.

The speakers gave an overview of Apache Ambari, discussed new features, and showed an end-to-end demo.

Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Jeff Sposetti (Hortonworks’ Senior Director of Product Management), and Mahadev Konar (Hortonworks’ Co-founder, Committer, and PMC Member for Apache Hadoop, Apache Ambari, and Apache Zookeeper) who presented the webinar.…

IBM InfoSphere Guardium has certified with HDP 2.1. The  Hortonworks Certified Technology Program simplifies big data planning by providing pre-built and validated integrations between leading enterprise technologies and HDP. 

Kathryn Zeidenstein, InfoSphere Guardium Evangelist, is our guest blogger and describes security, Hadoop, and the Guardium solution.

Those of us in the data security and privacy space tend to worry a lot. With each new breaking story on the latest data breach, and with the subsequent fallout, people higher and higher up the food chain are also worrying a lot.…

This week we hosted a webinar entitled HDP Advanced Security: Comprehensive Security for Enterprise Hadoop. Over 135 people attended, prompting an informative discourse and a series of questions.

The speakers outlined the HDP Advanced Security features and benefits in Hortonworks Data Platform and gave a demo. Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Balaji Ganesan (Hortonworks’ Senior Director, Enterprise Security Strategy), and Don Bosco Durai (Hortonworks’ Enterprise Security Architect).…

Today, we announce certification of Apache Spark as YARN Ready. This certification ensures memory and CPU intensive Spark-based applications can co-exist within a single Hadoop cluster with all the other workloads you have deployed. Together, they allow you to use a single cluster with a single set of data for multiple purposes rather than silo your Spark workloads into a separate cluster.

Oscar Padilla, Vice President of Strategy at Luminar, is our guest blogger. He shares his thoughts and insights about Apache Hadoop, Hortonworks Data Platform, and Luminar’s journey to the Data Lake.

Luminar is the first big data analytics provider focused specifically on U.S. Latino consumers. Our company offers analysis based on empirical insights, rather than with a sample-based approach. Apache Hadoop and Hortonworks Data Platform (HDP) make this empirical approach work at scale.…

Two months ago, we announced the acquisition of XA Secure. and at that time we stated that the software would be generally available by the end of June. We are happy to announce that we have delivered as promised and the solution is available for download for everyone today. Also, if you are an HDP Enterprise Plus Subscription customer, additional support for these new functions is now provided.

HDP Advanced Security expands on the solid security features already found in HDP to provide central administration and coordinated enforcement of enterprise security policy for a Hadoop cluster.…

We recently hosted the sixth of our seven Discover HDP 2.1 webinars, entitled Apache Storm for Stream Data Processing in Hadoop. Over 200 people attended the webinar and joined in the conversation.

Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Himanshu Bari (Hortonworks’ Senior Product Manager for Storm), and Taylor Goetz (Hortonworks’ Software Engineer and Apache Storm Committer) who presented the webinar. The speakers covered:

  • Why use Apache Storm?

It has been an exciting  few weeks for the XA Secure team. We formally joined Hortonworks on 5/15 and have received a warm  welcome from our new peers. Even more exciting are the numerous discussions we have had with current customers and prospects on how we can bring together a comprehensive and holistic security capabilities to HDP.  We now begin the journey to incubate our XA Secure functionality as a completely open source project governed by the Apache Software Foundation.…

Customers’ Hadoop Journey

We’ve all had two weeks to reflect on Hadoop Summit 2014. One of the biggest differences that stood out in this year’s Summit (as compared to Summit 2013) was the presence of large enterprise customers that are using Apache Hadoop as an important part of their modern data architectures.

Hadoop has gone beyond its original Yahoo use case—indexing the web via a nightly batch MapReduce process —and into the mainstream of daily data processing and analytics with real-time, online, interactive, and batch applications at many notable companies.…

Go to page:« First...34567...102030...Last »