Hadoop is a business-critical data platform at many of the world’s largest enterprises. These corporations require a layered security model focusing on four aspects of security: authentication, authorization, auditing, and data protection. Hortonworks continues to innovate in each of these areas, along with other members of the Apache open source community. In this blog, we will look at the authentication layer and how we can enforce strong authentication in HDP via Kerberos.…
The Hortonworks Blog
Hadoop Summit Content Curation
Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. I’ve selected a few sessions below for Hadoop system administrators and dev-ops, curating them under a general Hadoop operations theme.
Dev-ops engineers and system administrators know best that ease of operations and deployments can make or break a large Hadoop production cluster, which is why they care about all of the following:
- how rapidly they can create or replicate a cluster;
- how efficiently they can manage or monitor at scale;
- how easily and programmatically they can extend or customize their operational scripts; and
- how accurately they can foresee, forestall, or forecast resource starvation or capacity stipulation.
We thank all who have contributed to Storm – whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Together, we resolved 112 JIRA issues.
Here are summaries of this version’s important fixes and improvements.
New Feature Highlights
Netty Transport Overhaul
Storm’s Netty-based transport has been overhauled to significantly improve performance through better utilization of thread, CPU, and network resources, particularly in cases where message sizes are small.…
Last Thursday we hosted the last of our seven Discover HDP 2.1 webinars, Using Apache Ambari to Manage Hadoop Clusters. Over 140 people attended and joined in the conversation.
The speakers gave an overview of Apache Ambari, discussed new features, and showed an end-to-end demo.
Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Jeff Sposetti (Hortonworks’ Senior Director of Product Management), and Mahadev Konar (Hortonworks’ Co-founder, Committer, and PMC Member for Apache Hadoop, Apache Ambari, and Apache Zookeeper) who presented the webinar.…
IBM InfoSphere Guardium has certified with HDP 2.1. The Hortonworks Certified Technology Program simplifies big data planning by providing pre-built and validated integrations between leading enterprise technologies and HDP.
Kathryn Zeidenstein, InfoSphere Guardium Evangelist, is our guest blogger and describes security, Hadoop, and the Guardium solution.
Those of us in the data security and privacy space tend to worry a lot. With each new breaking story on the latest data breach, and with the subsequent fallout, people higher and higher up the food chain are also worrying a lot.…
This week we hosted a webinar entitled HDP Advanced Security: Comprehensive Security for Enterprise Hadoop. Over 135 people attended, prompting an informative discourse and a series of questions.
The speakers outlined the HDP Advanced Security features and benefits in Hortonworks Data Platform and gave a demo. Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Balaji Ganesan (Hortonworks’ Senior Director, Enterprise Security Strategy), and Don Bosco Durai (Hortonworks’ Enterprise Security Architect).…
We recently hosted the sixth of our seven Discover HDP 2.1 webinars, entitled Apache Storm for Stream Data Processing in Hadoop. Over 200 people attended the webinar and joined in the conversation.
Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Himanshu Bari (Hortonworks’ Senior Product Manager for Storm), and Taylor Goetz (Hortonworks’ Software Engineer and Apache Storm Committer) who presented the webinar. The speakers covered:
- Why use Apache Storm?
Apache YARN Ready Program
With the release of Apache Hadoop YARN in October of last year, organizations are moving from single-application Hadoop clusters to a versatile, integrated Hadoop 2 data platform hosting multiple applications — eliminating silos, maximizing resources and bringing true multi-workload capabilities to Hadoop.
Customers are telling us loud and clear: they want solutions that run on YARN because it enables them to run multiple workloads on the same common data pool.…
Data Analytics Virtual Event
Hortonworks and Teradata have partnered to provide a clear path to Big Data Analytics via stable and reliable Hadoop for the enterprise. We are excited to support their upcoming Big Data Analytics virtual event, “Data Discovery in Action.” We will have experts standing by to help answer questions to help ensure you have the right strategy in place for all of your big data.
At this event on July 2 nd, you will learn more about how Teradata’s Unified Big Data Architecture™ provides a quick path to data discovery.…
We’re finally catching our breath after a phenomenal Hadoop Summit event last week in San Jose. Thank you to everyone that came to participate in the celebration of Hadoop advances and adoption—from many of the organizations that shared their Hadoop journey with us that fundamentally transformed their businesses, to those just getting started, to the huge ecosystem of vendors. It is amazing to be part of such a broad and deep community that is contributing to making the market for everyone.…
Enterprises are using Apache Hadoop powered by YARN as a Data Operating System to run multiple workloads and use cases instead of using it just as a single purpose cluster.
A multi-purpose enterprise wide data platform often referred to as a data lake gives rise to the need for a comprehensive approach to security across the Hadoop platform and the workloads. Few weeks back Hortonworks acquired XA Secure to further execute on our vision to bring a holistic security framework to the Hadoop community irrespective of the workload.…
Apache YARN, Apache Slider, and Docker
Join us June 19 at 6 pm at the Hilton Fort Worth, Texas for an educational workshop hosted by Hortonworks and Sendero Business Services. The topic is “The Key To Success is Consistently Making Good Decisions & The Key To Good Decisions is Good Information.” The speaker is Don Hilborn, Solutions Engineer at Hortonworks.
Don will introduce the paradigm of
- Efficiency – double processing in Hadoop on the same hardware while providing predictable performance and quality of service; and
- Resource sharing – providing a stable common set of shared resources across multiple, coordinated workloads in Hadoop.
Apache Ambari has always provided an operator the ability to provision an Apache Hadoop cluster using an intuitive Cluster Install Wizard web interface, guiding the user through a series of steps:
- confirming the list of hosts
- assigning master, slave, and client components to configuring services, and
- installing, starting and testing the cluster.
With Ambari Blueprints, system administrators and dev-ops engineers can expedite the process of provisioning a cluster. Once defined, Blueprints can be re-used, which facilitates easy configuration and automation for each successive cluster creation.…
Since the partnership between Hortonworks and Splunk and the release of Hunk last year, we have created some awesome assets (i.e., Hunk sandbox tutorial, 360-degree customer view webinar) that have enabled Hadoop and Big Data enthusiasts’ hands-on training with Big Data. You can find more details around our partnership and resources here: http://hortonworks.com/partner/splunk/
As part of our HDP 2.1 certification series, I would like to introduce Brett Sheppard, Director of Product Marketing for Big Data at Splunk.…
We recently hosted the fourth of our seven Discover HDP 2.1 webinars, entitled Apache 2.4.0, HDFS and YARN. It was very well attended and a very informative discourse. The speakers outlined the new features in YARN and HDFS in HDP 2.1 including:
- HDFS Extended ACLs
- HTTPs support for WebHDFS and for the Hadoop web UIs
- HDFS Coordinated DataNode Caching
- YARN Resource Manager High Availability
- Application Monitoring through the YARN Timeline Server
- Capacity Scheduler Preemption
Many thanks to our presenters, Rohit Bakhshi (Hortonworks’ senior product manager), Vinod Kumar Vavilapalli (co-author of the YARN Book, PMC, Hadoop YARN Project Lead at Apache and Hortonworks), and Justin Sears (Hortonworks’ Product Marketing Manager).…