The Hortonworks Blog

Posts categorized by : Apache Hadoop

Concurrent Inc. is a Hortonworks Technology Partner and recently announced that Cascading 3.0 now supports Apache Tez as an application runtime. Cascading is a powerful development framework for building enterprise data applications on Hadoop and is one of the most widely deployed technologies for data applications, with more than 175,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in data application development on Hadoop.…

Internet of Things (IoT) Potential and Process

It may seem obvious (or inevitable), but many companies are embracing the Internet of Things (IoT)—and for good reasons, notes Forbes’ Mike Kavis. For one, McKinsey Global Institute reports that IoT business will reach $6.2 trillion in revenue by 2025. And second, more and more objects are becoming embedded with sensors that communicate real-time data to data centers’ networks for processing, explain McKinsey’s Chui, Loffler, and Roberts.…

On September 17, the Apache Software Foundation (ASF) voted to graduate Apache Storm to a top-level project (TLP). This represents a major step forward for the project and represents the momentum built by a broad community of developers from not only Hortonworks, but also Yahoo!, Alibaba, Twitter, Microsoft and many other companies.

What is Apache Storm and why is it useful?

Apache Storm is a distributed, fault tolerant, and highly scalable platform for processing streaming data.…

ITC Infotech is a Hortonworks consulting and integration partner and provides IT services and solutions to leading global customers. The company addresses a wide range of customer challenges through innovative IT solutions.

Today, guest blogger Aditya Agrawal, head of Advance technology, ZLabs at ITC Infotech focuses on ITC’s RADAR framework for the Retail industry.

STORM and SOLR are excellent examples of new Hadoop tools that enable new use cases that were pretty hard to implement before.…

The Apache Tez community is thrilled to announce the release of version 0.5 of the project. We’re referring to this as “the developer release” because it’s all about developers. The community focused on meeting the key needs of developers using Tez to create their applications and engines. Tez 0.5 includes clean and intuitive developer APIs, easy debugging, extensive documentation and deployment with rolling upgrades.

Apache Hadoop YARN paved the way for Apache Tez.…

Summary

This blog covers how recent developments have made it easy to use ORCFile from Cascading or Apache Crunch and that doing so can accelerate data processing more than 5x. Code samples are provided so that you can start integrating ORCFile into your Cascading or Crunch projects today.

What are Cascading and Apache Crunch?

Cascading and Apache Crunch are high-level frameworks that make it easy to process large amounts of data in distributed clusters.…

Hortonworks is committed to collaborate with ISVs and partners to onboard their applications to YARN and Hadoop. As part of the YARN Webinar Series, we have introduced different methods to help you integrate your applications to YARN: Native YARN integration, Slider and Tez. As part of this series, we now offer the opportunity to learn Scalding, with guest speaker from Twitter, who will talk about simplifying application development on Apache Hadoop and YARN.…

Novetta is a new Hortonworks Technology Partner and recently achieved HDP 2.1 Certification and YARN Ready status. In this guest blog, Jennifer Reed, director of product management at Novetta, talks about Novetta’s YARN Ready entity resolution and relationship dimension-building application.

The New Era of Analytics

Thomas Davenport, in his keynote at the Hadoop Summit San Jose 2014, said that the big data analytics has entered a new phase: From Analytics 2.0 to 3.0.…

StackIQ, a Hortonworks technology partner, offers a comprehensive software suite that automates the deployment, provisioning, and management of Big Infrastructure. In his second guest blog, Anoop Rajendra (@anoop_r), a Senior Software Developer at StackIQ, gives instructions for using StackIQ Comand Line Interface (CLI) to deploy a Hortonworks Data Platform (HDP) cluster.

In a previous blog post, we discussed how StackIQ’s Cluster Manager automates the installation and configuration of an Apache Ambari server.…

Speed, Scale, and SQL Semantics

Since its inception and graduation as a Top Level Project (TPL) from Apache Foundation Project (ASF) in September 2010, Apache Hive has been steadily improving—in speed, scale, and SQL semantics—to meet enterprise requirements for both interactive and batch queries at Hadoop scale.

It has become a defacto standard for SQL queries over petabytes of data stored in Hadoop. It is a compliant SQL engine that offers familiarity to developers over a comprehensive and familiar set of SQL semantics for Apache Hadoop.…

In this partner guest blog, Microsoft Principal Software Development Engineer Eric Hanson weighs in how Stinger.next will benefit HDInsight customers. Coming from someone who worked on Microsoft SQL Server for years and is a committer to Apache Hive, Eric explains that Stinger.next initiatives and capabilities are essential to take Hive to the next level.

Apache Hive is one of the most-used features of Microsoft’s cloud Hadoop service, Azure HDInsight. So our HDInsight customers of course will enjoy new capabilities that make Hive faster.…

Continuing our ecosystem momentum for the next generation of SQL in Hadoop, here to share his insights with us on the potential that Stinger.next holds for both the individual data worker as well as the data driven company alike is Dustin Smith, Product Marketing Manager at Tableau Software.

The work delivered over the last year as part of Stinger has made a tremendous impact for our customers who are using Tableau to analyze Hadoop data, and we are excited to see this momentum continue under the leadership of Hortonworks within the Apache Hive community. …

In case you missed it — earlier this week, Alan Gates and team provided some insights into Stinger.next roadmap around the delivery of Enterprise SQL and Hadoop Scale. We’re excited to continue the conversation and include some of our key partners around their excitement on this important initiative. Today’s guest blogger, Michael Hiskey, Chief Product Evangelist & Product Marketing, from MicroStrategy, provides some insight on the Stinger.next initiatives and how this will benefit MicroStrategy customers and the overall Big Data and Hadoop community.…

Big data growth continues to be a major consumer of enterprise IT resources with no end in sight. To gain value out of this data, organizations are creating new analytic applications for their business users. This growth in data and applications impacts hardware, networking and software resource consumption in the data center. CIOs are looking for ways to future proof their data management infrastructure and to be more efficient about how they monitor and manage their Apache Hadoop clusters.…

Apache Ambari is an open operational framework to provision, manage and monitor Hadoop clusters. As Hadoop has grown from a single purpose (MapReduce) framework to an extensible multi-purpose compute platform, with Apache Hadoop YARN as its architectural center, Apache Ambari has marched hand-in-hand to meet the evolving operational needs of Enterprise Hadoop.

Enabling ecosystem integration has been a key thrust of recent innovations within the Apache Ambari community. Key developments including Stack Extensibility and Ambari Views allow Ambari to deploy and manage YARN enabled applications.…

Go to page:12345...102030...Last »