The Hortonworks Blog

Posts categorized by : Innovation from Hortonwoks

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. We ’ve selected a few sessions for Hadoop developers, practitioners, and architects, curating them under Apache Hadoop YARN, the architectural center and the data operating system.

In most of the keynotes and tracks three themes resonated:

  • Enterprises are transitioning from traditional Hadoop to modern Hadoop 2.
  • YARN is an enabler, the central orchestrator that facilitates multiple workloads, runs multiple data engines, and supports multiple access patterns—batch, interactive, streaming, and real-time—in Apache Hadoop 2.…
  • Incremental Updates

    Hadoop and Hive are quickly evolving to outgrow previous limitations for integration and data access.
    On the near-term development roadmap, we expect to see Hive supporting full CRUD operations (Insert, Select, Update, Delete). As we wait for these advancements, there is still a need to work with the current options—OVERWRITE or APPEND— for Hive table integration.

    The OVERWRITE option requires moving the complete record set from source to Hadoop.…

    Hadoop is a business-critical data platform at many of the world’s largest enterprises. These corporations require a layered security model focusing on four aspects of security: authentication, authorization, auditing, and data protection. Hortonworks continues to innovate in each of these areas, along with other members of the Apache open source community. In this blog, we will look at the authentication layer and how we can enforce strong authentication in HDP via Kerberos.…

    Tresata, a Hortonworks Certified Technology Partner, is a next-generation predictive analytics software company that helps enterprises monetize big data™they have moved to Hadoop . In this blog, Tresata’s Director of Marketing, Katie Levans, (@katie_levans) describes the value of HDP 2.1 certification and the benefit of their solution. 

    Last month Tresata announced the release of the third generation of their hugely successful software application TREE 3.3 and its subsequent certification on HDP 2.1.…

    Hadoop Summit Content Curation

    Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. I’ve selected a few sessions below for Hadoop system administrators and dev-ops, curating them under a general Hadoop operations theme.

    Dev-ops engineers and system administrators know best that ease of operations and deployments can make or break a large Hadoop production cluster, which is why they care about all of the following:

    • how rapidly they can create or replicate a cluster;
    • how efficiently they can manage or monitor at scale;
    • how easily and programmatically they can extend or customize their operational scripts; and
    • how accurately they can foresee, forestall, or forecast resource starvation or capacity stipulation.

    Apache Cassandra is an open source NoSQL distributed database management system designed to handle large amounts of data offering a scalable real time solution that allows users to create online applications that are “always-on, no matter what.” DataStax is the company behind Cassandra, and a new Technology Partner of Hortonworks.

    Lynn Walitch leads Partner Management for DataStax and is our guest blogger today. Lynn discusses the importance of the partnership and certification with Hortonworks.…

    The Apache Storm community recently announced the release of Apache Storm 0.9.2, which includes improvements to Storm’s user interface and an overhaul of its netty-based transport.

    We thank all who have contributed to Storm – whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Together, we resolved 112 JIRA issues.

    Here are summaries of this version’s important fixes and improvements.

    New Feature Highlights
    Netty Transport Overhaul

    Storm’s Netty-based transport has been overhauled to significantly improve performance through better utilization of thread, CPU, and network resources, particularly in cases where message sizes are small.…

    We certainly live in interesting times. About 20 months ago, in an effort to find proprietary differentiation that could be used to monetize and lock in customers to their model, Cloudera unveiled Impala and at that time Mike Olson stated “Our view is that, long-term, this will supplant Hive”. Only 6 months ago in his Impala v Hive post, Olson defended his “decision to develop Impala from the ground up as a new project, rather than improving the existing Apache Hive project” stating “Put bluntly: We chose to build Impala because Hive is the wrong architecture for real-time distributed SQL processing.”

    So, 20 months after abandoning Hive and repeated marketing attempts to throw Hive and many other SQL alternatives under the bus in lieu of their “better” approach, I’m certainly puzzled as Cloudera unveils their plan to enable Apache Hive to run on Apache Spark; please see HIVE-7292 for details.…

    Last Thursday we hosted the last of our seven Discover HDP 2.1 webinars, Using Apache Ambari to Manage Hadoop Clusters. Over 140 people attended and joined in the conversation.

    The speakers gave an overview of Apache Ambari, discussed new features, and showed an end-to-end demo.

    Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Jeff Sposetti (Hortonworks’ Senior Director of Product Management), and Mahadev Konar (Hortonworks’ Co-founder, Committer, and PMC Member for Apache Hadoop, Apache Ambari, and Apache Zookeeper) who presented the webinar.…

    IBM InfoSphere Guardium has certified with HDP 2.1. The  Hortonworks Certified Technology Program simplifies big data planning by providing pre-built and validated integrations between leading enterprise technologies and HDP. 

    Kathryn Zeidenstein, InfoSphere Guardium Evangelist, is our guest blogger and describes security, Hadoop, and the Guardium solution.

    Those of us in the data security and privacy space tend to worry a lot. With each new breaking story on the latest data breach, and with the subsequent fallout, people higher and higher up the food chain are also worrying a lot.…

    This week we hosted a webinar entitled HDP Advanced Security: Comprehensive Security for Enterprise Hadoop. Over 135 people attended, prompting an informative discourse and a series of questions.

    The speakers outlined the HDP Advanced Security features and benefits in Hortonworks Data Platform and gave a demo. Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Balaji Ganesan (Hortonworks’ Senior Director, Enterprise Security Strategy), and Don Bosco Durai (Hortonworks’ Enterprise Security Architect).…

    Two months ago, we announced the acquisition of XA Secure. and at that time we stated that the software would be generally available by the end of June. We are happy to announce that we have delivered as promised and the solution is available for download for everyone today. Also, if you are an HDP Enterprise Plus Subscription customer, additional support for these new functions is now provided.

    HDP Advanced Security expands on the solid security features already found in HDP to provide central administration and coordinated enforcement of enterprise security policy for a Hadoop cluster.…

    We recently hosted the sixth of our seven Discover HDP 2.1 webinars, entitled Apache Storm for Stream Data Processing in Hadoop. Over 200 people attended the webinar and joined in the conversation.

    Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Himanshu Bari (Hortonworks’ Senior Product Manager for Storm), and Taylor Goetz (Hortonworks’ Software Engineer and Apache Storm Committer) who presented the webinar. The speakers covered:

    • Why use Apache Storm?

    It has been an exciting  few weeks for the XA Secure team. We formally joined Hortonworks on 5/15 and have received a warm  welcome from our new peers. Even more exciting are the numerous discussions we have had with current customers and prospects on how we can bring together a comprehensive and holistic security capabilities to HDP.  We now begin the journey to incubate our XA Secure functionality as a completely open source project governed by the Apache Software Foundation.…

    Customers’ Hadoop Journey

    We’ve all had two weeks to reflect on Hadoop Summit 2014. One of the biggest differences that stood out in this year’s Summit (as compared to Summit 2013) was the presence of large enterprise customers that are using Apache Hadoop as an important part of their modern data architectures.

    Hadoop has gone beyond its original Yahoo use case—indexing the web via a nightly batch MapReduce process —and into the mainstream of daily data processing and analytics with real-time, online, interactive, and batch applications at many notable companies.…

    Go to page:12345...10...Last »