Hortonworks is always pleased to see new contributions come into the open-source community. We worked with our customer, Hotels.com, to help them develop libraries and utilities around Apache Hive, the Apache ORC format and Cascading. It’s great to see the results released for the community. In this guest blog, Adrian Woodhead, Big Data Engineering Team Lead at Hotels.com, discusses the CORC project.
The Hortonworks Blog
- Business Values of Hadoop
- Why Hortonworks
- Industry Verticals
- Industry Happenings
- Deployment Options
- Types of Data
The Apache Lucene/Solr community is continuing its rapid release cycles to meet community and customer requirements. In this guest blog, we have invited Sarath Jarugula from Lucidworks to share with us the many improvements in the Apache Solr 5.2 release.
The Apache Solr community has announced its Solr 5.2 release. Solr 5.2 is a follow-up release to Solr 5.0, a significant major release in February 2015. The community has delivered 25 new features, 5 optimizations, and 38 bug fixes in this release.…
As YARN drives Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. The Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a platform for centralized security policy administration across the core enterprise security requirements of authorization, audit and data protection.
On June 10th, the community announced the release of Apache Ranger 0.5.0. With this release, the community took major steps to extend security coverage for Hadoop platform and deepen its existing security capabilities.…
Earlier this month, Hortonworks had the pleasure of joining Yahoo! in hosting the 8th Annual Hadoop Summit, the leading conference for the Apache Hadoop community. Summit is always an important and exciting experience, bringing together thought leaders, technologists, and data specialists from throughout the community to explore and advance the art and science of Big Data.
This year’s event came at a pivotal time for Hadoop and Hortonworks, with news about Open Enterprise Hadoop and the launch of the newest version of Hortonworks Data Platform (HDP 2.3™) poised to transform the way large organizations in every industry process data.…
In his blog, Tim Hall wrote, “Enterprises are embracing Apache Hadoop to enable their modern data architectures and power new analytic applications. The freedom to choose the on-premises or cloud environments for Hadoop that best meets the business needs is a critical requirement.”
One of the choices in deploying Hadoop in the cloud environment is with Microsoft Azure using Cloudbreak. Other choices include Google Cloud Platform, Openstack, and AWS.
But in this blog, I’ll show how you can deploy Hadoop in Azure with few clicks by running HDP multimode cluster in Azure’s Linux VM using Cloudbreak.…
Mayank Bansal, of EBay, is a guest contributing author of this collaborative blog.
This is the 4th post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of Apache Hadoop YARN in HDP.Background
Multihoming is the practice of connecting a host to more than a single network. This is frequently used to provide network-level fault tolerance – if hosts are able to communicate on more than one network, the failure of one network will not render the hosts inaccessible. There are other use cases for multi-homing as well, including traffic segregation to isolate congestion and support for different network media optimized for different use cases.…
The Apache community released Apache Pig 0.15.0 last week. Although there are many new features in Apache Pig 0.15.0, we would like to highlight two major improvements:
- Pig on Tez enhancements
- Using Hive UDFs inside Pig
Below are some details about these important features. For the complete list of features, improvements, and bug fixes, please see the release notes.Notable Changes 1. Pig on Tez enhancements Scalability of Pig on Tez
Oracle and Hortonworks continue to work on bringing the latest ELT and real-time transactional data streaming capabilities to the Hortonworks Data Platform (HDP). Recently Oracle completed certification testing for HDP 2.2 for both Oracle Data Integrator and Oracle GoldenGate for Big Data, both integral parts of the Oracle Data Integration product portfolio. These releases certified on HDP 2.2 are the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level of enterprise integration.…
As businesses continue to create data at an ever-increasing pace, data architectures are strained under the loads placed upon them. Data volumes continue to grow considerably, low-value workloads like ETL consume more and more processing resources, and new types of data can’t easily be captured and put to use. Organizations struggle with escalating costs, increasing complexity, and the challenge of expansion.
This coming Wednesday, Big Data experts will look at how Hadoop is enabling a broad range of organizations to address these challenges.…
The components in a modern data architecture vary from one enterprise to the next and the mix changes over time. Many of our Hortonworks subscribers need support ensuring that their Hortonworks Data Platform (HDP) clusters are optimally configured. This means that they need proactive, intelligent cluster analysis.
As businesses onboard new workloads to the platform, it taxes the resources of Hadoop operators. And so our customers have asked Hortonworks for guidance and best practices to reduce their operational risk and efficiently resource their staff for Hadoop operations.…
Apache Hadoop has emerged as a critical data platform to deliver business insights hidden in big data. As a relatively new technology, system administrators hold Hadoop to higher security standards. There are several reasons for this scrutiny:
- External ecosystem that comprise of data repositories and operational systems that feed Hadoop deployments are highly dynamic and can introduce new security threats on a regular basis.
- Hadoop deployment contains large volume of diverse data stored over longer periods of time.
Hadoop isn’t optional for today’s enterprises—that much is clear. But as companies race to get control over the significantly growing volumes of unstructured data in their organizations, they’ve been less certain about the right way to put Hadoop to work in their environment.
We’ve already seen a variety of wrong approaches with proprietary extensions that limit innovation, fragment architectures and trade openness for vendor lock-in. Now a new consensus is forming around an emerging category that drives truly transformational outcomes: Open Enterprise Hadoop.…
Over the past two quarters, Hortonworks has been able to attract over 200 new customers. We are attempting to feed the hunger our customers have shown for Hadoop over the past two years. We are seeing truly transformational business outcomes delivered through the use of Hadoop across all industries. The most prominent use cases are focused on:
- Data Architecture Optimization – keeping 100% of the data at up to 1/100th of the cost while enriching traditional data warehouse analytics
- A Single View of customers, products, and supply chains
- Predictive Analytics – delivering behavioral insight, preventative maintenance, and resource optimization
- Data Discovery – exploring datasets, uncovering new findings, and operationalizing insights
What we have consistently heard from our customers and partners, as they adopt Hadoop, is that they would like Hortonworks to focus our engineering activities on three key themes: Ease of Use, Enterprise Readiness, and Simplification.…
Sumeet Kumar Agrawal, principal product manager for Big Data Edition product at Informatica, is our guest blogger. In this blog, explains how Informatica’s Big Data Edition integrates with Tez and allow for significant performance gains.
Informatica Big Data Edition’s codeless visual development environment accelerates the ability of enterprises to take advantage of amazing innovations in big data to solve new challenges with skill sets that already exist within many organizations. Informatica natively integrates with big data platforms like Hadoop and NoSQL to enable next-generation big data solutions, including data warehouse optimization and 360 customer analytics.…