The Hortonworks Blog

Today we are excited to announce a deepening of our strategic partnership with HP . This news builds on the reseller partnership that we established in 2013 enabling HP to resell the Hortonworks Data Platform. It also allows us to build on the HP AllianceOne ConvergedSystems Partner of the Year Award that we received at the recent HP Discover 2014 conference for our strategic partnership.

Given the rapid adoption of Enterprise Hadoop as a core component of a modern data architecture combined with the fact that HP is the world’s leading server vendor in terms of shipments AND revenues according to IDC – meaning a significant number of those Hadoop nodes are being deployed with HP technologies – it’s hardly surprising that we’ve been collaborating closely.…

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. We ’ve selected a few sessions for Hadoop developers, practitioners, and architects, curating them under Apache Hadoop YARN, the architectural center and the data operating system.

In most of the keynotes and tracks three themes resonated:

  • Enterprises are transitioning from traditional Hadoop to modern Hadoop 2.
  • YARN is an enabler, the central orchestrator that facilitates multiple workloads, runs multiple data engines, and supports multiple access patterns—batch, interactive, streaming, and real-time—in Apache Hadoop 2.
  • Last week, Apache Tez graduated to become a top level project within the Apache Software Foundation (ASF). This represents a major step forward for the project and is representative of its momentum that has been built by a broad community of developers from not only Hortonworks but Cloudera, Facebook, LinkedIn, Microsoft, NASA JPL, Twitter, and Yahoo as well.

    What is Apache Tez and why is it useful?

    Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets.…

    The Apache Pig community released Pig 0.13. earlier this month. Pig uses a simple scripting language to perform complex transformations on data stored in Apache Hadoop. The Pig community has been working diligently to prepare Pig to take advantage of the DAG processing capabilities in Apache Tez. We also improved usability and performance.

    This blog post summarizes the progress we’ve made.

    Support for Backends Other Than MapReduce

    We made the Pig 0.13 architecture more general to support multiple backends beyond just MapReduce, while maintaining backward compatibility.…

    As part of our YARN Ready program, we are hosting a series of technical webinars highlighting the technologies and resources available to developers for creating YARN applications. In our first webinar, “Introduction to YARN Ready,” we presented an overview of the YARN Ready program.

    To extend your technical knowledge, please join us for our first in-depth YARN Ready technology webinar, “Integrating Applications Natively to YARN” on Thursday July 24 at 9am Pacific Time.…

    Incremental Updates

    Hadoop and Hive are quickly evolving to outgrow previous limitations for integration and data access. On the near-term development roadmap, we expect to see Hive supporting full CRUD operations (Insert, Select, Update, Delete). As we wait for these advancements, there is still a need to work with the current options—OVERWRITE or APPEND— for Hive table integration.

    The OVERWRITE option requires moving the complete record set from source to Hadoop.…

    Hadoop is a business-critical data platform at many of the world’s largest enterprises. These corporations require a layered security model focusing on four aspects of security: authentication, authorization, auditing, and data protection. Hortonworks continues to innovate in each of these areas, along with other members of the Apache open source community. In this blog, we will look at the authentication layer and how we can enforce strong authentication in HDP via Kerberos.…

    Tresata, a Hortonworks Certified Technology Partner, is a next-generation predictive analytics software company that helps enterprises monetize big data™they have moved to Hadoop . In this blog, Tresata’s Director of Marketing, Katie Levans, (@katie_levans) describes the value of HDP 2.1 certification and the benefit of their solution. 

    Last month Tresata announced the release of the third generation of their hugely successful software application TREE 3.3 and its subsequent certification on HDP 2.1.…

    Hadoop Summit Content Curation

    Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. I’ve selected a few sessions below for Hadoop system administrators and dev-ops, curating them under a general Hadoop operations theme.

    Dev-ops engineers and system administrators know best that ease of operations and deployments can make or break a large Hadoop production cluster, which is why they care about all of the following:

    • how rapidly they can create or replicate a cluster;
    • how efficiently they can manage or monitor at scale;
    • how easily and programmatically they can extend or customize their operational scripts; and
    • how accurately they can foresee, forestall, or forecast resource starvation or capacity stipulation.

    Today we are delighted to announce the formal partnership between Accenture and Hortonworks, which is the continuing evolution of the ongoing collaboration between the two companies which started in 2012. With this formal agreement, Accenture and Hortonworks will collaborate on making large structured and unstructured datasets – including operational, video and sensor data – more accessible to organizations for insight-driven decision-making. Together, the two companies will continue to collaborate on joint horizontal and vertical solutions to speed the adoption of Apache Hadoop.…

    Apache Cassandra is an open source NoSQL distributed database management system designed to handle large amounts of data offering a scalable real time solution that allows users to create online applications that are “always-on, no matter what.” DataStax is the company behind Cassandra, and a new Technology Partner of Hortonworks.

    Lynn Walitch leads Partner Management for DataStax and is our guest blogger today. Lynn discusses the importance of the partnership and certification with Hortonworks.…

    Merv Adrian couldn’t have said it better. In his blog post from the weekend, he continued in his quest to define Hadoop. And it is no easy quest as the components of, and evolution of, Hadoop is happening at a pace that is, frankly, astounding.

    The continuous evolution of Hadoop has even given rise to sentiments such as ‘Is Hadoop dead? ‘ The answer to that question is YES. And NO.  …

    The Apache Storm community recently announced the release of Apache Storm 0.9.2, which includes improvements to Storm’s user interface and an overhaul of its netty-based transport.

    We thank all who have contributed to Storm – whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Together, we resolved 112 JIRA issues.

    Here are summaries of this version’s important fixes and improvements.

    New Feature Highlights Netty Transport Overhaul

    Storm’s Netty-based transport has been overhauled to significantly improve performance through better utilization of thread, CPU, and network resources, particularly in cases where message sizes are small.…

    We certainly live in interesting times. About 20 months ago, in an effort to find proprietary differentiation that could be used to monetize and lock in customers to their model, Cloudera unveiled Impala and at that time Mike Olson stated “Our view is that, long-term, this will supplant Hive”. Only 6 months ago in his Impala v Hive post, Olson defended his “decision to develop Impala from the ground up as a new project, rather than improving the existing Apache Hive project” stating “Put bluntly: We chose to build Impala because Hive is the wrong architecture for real-time distributed SQL processing.”

    So, 20 months after abandoning Hive and repeated marketing attempts to throw Hive and many other SQL alternatives under the bus in lieu of their “better” approach, I’m certainly puzzled as Cloudera unveils their plan to enable Apache Hive to run on Apache Spark; please see HIVE-7292 for details.…

    Last Thursday we hosted the last of our seven Discover HDP 2.1 webinars, Using Apache Ambari to Manage Hadoop Clusters. Over 140 people attended and joined in the conversation.

    The speakers gave an overview of Apache Ambari, discussed new features, and showed an end-to-end demo.

    Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Jeff Sposetti (Hortonworks’ Senior Director of Product Management), and Mahadev Konar (Hortonworks’ Co-founder, Committer, and PMC Member for Apache Hadoop, Apache Ambari, and Apache Zookeeper) who presented the webinar.…

    Go to page:« First...56789...203040...Last »