The Hortonworks Blog

Posts categorized by : Hadoop

Sumeet Kumar Agrawal, principal product manager for Big Data Edition product at Informatica, is our guest blogger. In this blog, explains how Informatica’s Big Data Edition integrates with Tez and allow for significant performance gains.

Informatica Big Data Edition’s codeless visual development environment accelerates the ability of enterprises to take advantage of amazing innovations in big data to solve new challenges with skill sets that already exist within many organizations. Informatica natively integrates with big data platforms like Hadoop and NoSQL to enable next-generation big data solutions, including data warehouse optimization and 360 customer analytics.…

Our guest blogger today is Sean Anderson, Manager of Data Service at Rackspace, the managed cloud company. Sean will share with us all the work Rackspace is doing with Hortonworks Data Platform (HDP) for an an Enterprise-ready Hadoop solution.

Rackspace is excited to be joining the open source data platform community for Hadoop Summit 2015 hosted by Hortonworks and Yahoo. We partnered with Hortonworks in 2013 to build two platforms—one that delivers enterprise-ready Hadoop on-demand in the cloud, and another that delivers customizable and secure dedicated servers backed by fanatical support and expertise.…

Last week, the Apache Slider community released Apache Slider 0.80.0. Although there are many new features in Slider 0.80.0, few innovations are particularly notable:

  • Containerized application onboarding
  • Seamless zero-downtime application upgrade
  • Adding co-processors to app packages without reinstallation
  • Simplified application onboarding without any packaging requirement

Below are some details about these important features. For the complete list of features, improvements, and bug fixes, see the release notes.

Notable Changes: Containerized application onboarding

This release of Apache Slider provides a way to deploy containerized applications on YARN and leverage YARN’s resource management capabilities.…

This is a guest blog post from Jerry Megaro, Merck’s Director of Innovation and Manufacturing Analytics. Jerry established the practice of Data Excellence and Data Sciences within the Merck Manufacturing Division and now leads initiatives to transform Merck Manufacturing into a data-driven organization that enhances the company’s performance across the supply chain.

Hortonworks experience working with top pharma manufacturers indicates an exciting opportunity to improve manufacturing performance by proactively managing process variability.…

As we approach Hadoop Summit in San Jose next week, the debate continues over where Hadoop really is on its adoption curve. George Leopold from Datanami was one of the first to beat the hornet’s nest with his article entitled Gartner: Hadoop Adoption ‘Fairly Anemic’. Matt Asay from TechRepublic and Virginia Backaitis from CMSWire volleyed back with Hadoop Numbers Suggest the Best is Yet to Come and Gartner’s Dismal Predictions for Hadoop Could Be Wrong, respectively.…

Today I am excited to announce that we have made a significant expansion of our operations in Australia in response to growing demand for open enterprise Hadoop in Australia and around the APAC region.

Focused on Sydney but with the ability to execute across Australia, this year we have hired several senior sales and technical staff drawn from industry-leading technology vendors. With this additional experience, we are better able to help customers regionally with their big data needs.…

Not a day passes without someone tweeting or re-tweeting a blog on the virtues of Apache Spark.

At a Memorial Day BBQ, an old friend proclaimed: “Spark is the new rub, just as Java was two decades ago. It’s a developers’ delight.”

Spark as a distributed data processing and computing platform offers much of what developers’ desire and delight—and much more. To the ETL application developer Spark offers expressive APIs for transforming data; to the data scientists it offers machine libraries, MLlib component; and to data analysts it offers SQL capabilities for inquiry.…

Today from TU-Automotive Detroit, we announced our partnership with HARMAN, the leading global infotainment, audio and software services company.

Hortonworks and HARMAN are partnering to transform the automotive enterprise by enabling the connected car ecosystem with real-time, Internet of Things (IoT) data, insights and prognostics solutions.

The widespread adoption of connected devices is accelerating. Gartner Research expects 25 billion installed devices by 2020. Together, Hortonworks and HARMAN will offer solutions to help automotive manufacturers gain valuable insights by analyzing real-time information based on data streaming from connected cars.…

Apache Spark provides a lot of valuable tools for data science. With our release of Apache Spark 1.3.1 Technical Preview, the powerful Data Frame API is available on HDP.

Data scientists use data exploration and visualization to help frame the question and fine tune the learning. Apache Zeppelin helps with this.

Based on the concept of an interpreter that can be bound to any language or data processing backend, Zeppelin is a web based notebook server.…

Hortonworks proudly announces the launch of a new education program for Academic Institutions. This program was created to introduce students to the Hortonworks Data Platform (HDP) and to provide them with the necessary technical skills to complement their chosen academic curriculum.

Accredited colleges and universities around the world are invited to apply to become a Hortonworks Academic Partner, allowing them to incorporate our course materials into their classrooms at a low cost to students.…

Hortonworks proudly announces the launch of a new education program for Academic Institutions. This program was created to introduce students to the Hortonworks Data Platform (HDP) and to provide them with the necessary technical skills to complement their chosen academic curriculum.

Accredited colleges and universities around the world are invited to apply to become a Hortonworks Academic Partner, allowing them to incorporate our course materials into their classrooms at a low cost to students.…

The Apache Accumulo community has announced its 1.7.0 release. As community’s first major release of 2015, the release represents the culmination of a year of effort from many Accumulo committers and contributors. Apart from many notable changes enumerated below, Accumulo is now well integrated with Apache Ambari.

In this release, 43 different individuals fixed 691 JIRA issues, and we thank everyone who helped in any way to make this Apache Accumulo 1.7.0 a reality.…

Hadoop really is everywhere. In his recent post, “Going from Hadoop Adoption to Hadoop Everywhere” Shaun Connolly made this point and also quoted Forrester’s Mike Gualtieri:

Hadoop is a must-have for large enterprises

Shaun mentioned these key trends in his post:

  • Hadoop is transforming every industry
  • Enterprises are building applications to make use of all kinds of data
  • Hadoop is ready for the enterprise

Earlier this month, we released Hortonworks’ first quarter earnings.…

In this guest blog, IDC Program Director for Retail Insights Greg Girard shares his insights how retailers employ big data and analytics to drive decision and action across myriad industries. 

Big data and analytics (BDA) have become top agenda items for a growing number of retail executives, and rightly so in the broader social and economic context of data-enabled decision and action. While “data-driven,” as a term, has been around for quite some time, the ability to act on insight has taken on new urgency.…

SQL is the most popular use case for the Hadoop user community, and Apache Hive is still the defacto standard. Early this week, the Apache Hive community released Apache Hive 1.2.0.

Already the third release this year, the Hive developer community continues to improve the release and grow its team, with 11 Hive contributors promoted to committers in the last three months. Dedicated to make Hive enterprise-ready, the community has made improvements in the following areas:

  • Additional SQL functionality
  • Security enhancements
  • Performance gains
  • Stability and usability
  • For the complete list of features, improvements, and bug fixes, see the release notes.…