The Hortonworks Blog

Concurrent Inc. is a Hortonworks Technology Partner and recently announced that Cascading 3.0 now supports Apache Tez as an application runtime. Cascading is a powerful development framework for building enterprise data applications on Hadoop and is one of the most widely deployed technologies for data applications, with more than 175,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in data application development on Hadoop.…

Internet of Things (IoT) Potential and Process

It may seem obvious (or inevitable), but many companies are embracing the Internet of Things (IoT)—and for good reasons, notes Forbes’ Mike Kavis. For one, McKinsey Global Institute reports that IoT business will reach $6.2 trillion in revenue by 2025. And second, more and more objects are becoming embedded with sensors that communicate real-time data to data centers’ networks for processing, explain McKinsey’s Chui, Loffler, and Roberts.…

On September 17, the Apache Software Foundation (ASF) voted to graduate Apache Storm to a top-level project (TLP). This represents a major step forward for the project and represents the momentum built by a broad community of developers from not only Hortonworks, but also Yahoo!, Alibaba, Twitter, Microsoft and many other companies.

What is Apache Storm and why is it useful?

Apache Storm is a distributed, fault tolerant, and highly scalable platform for processing streaming data.…

ITC Infotech is a Hortonworks consulting and integration partner and provides IT services and solutions to leading global customers. The company addresses a wide range of customer challenges through innovative IT solutions.

Today, guest blogger Aditya Agrawal, head of Advance technology, ZLabs at ITC Infotech focuses on ITC’s RADAR framework for the Retail industry.

STORM and SOLR are excellent examples of new Hadoop tools that enable new use cases that were pretty hard to implement before.…

The Apache Tez community is thrilled to announce the release of version 0.5 of the project. We’re referring to this as “the developer release” because it’s all about developers. The community focused on meeting the key needs of developers using Tez to create their applications and engines. Tez 0.5 includes clean and intuitive developer APIs, easy debugging, extensive documentation and deployment with rolling upgrades.

Apache Hadoop YARN paved the way for Apache Tez.…

Summary

This blog covers how recent developments have made it easy to use ORCFile from Cascading or Apache Crunch and that doing so can accelerate data processing more than 5x. Code samples are provided so that you can start integrating ORCFile into your Cascading or Crunch projects today.

What are Cascading and Apache Crunch?

Cascading and Apache Crunch are high-level frameworks that make it easy to process large amounts of data in distributed clusters.…

Hortonworks is committed to collaborate with ISVs and partners to onboard their applications to YARN and Hadoop. As part of the YARN Webinar Series, we have introduced different methods to help you integrate your applications to YARN: Native YARN integration, Slider and Tez. As part of this series, we now offer the opportunity to learn Scalding, with guest speaker from Twitter, who will talk about simplifying application development on Apache Hadoop and YARN.…

Novetta is a new Hortonworks Technology Partner and recently achieved HDP 2.1 Certification and YARN Ready status. In this guest blog, Jennifer Reed, director of product management at Novetta, talks about Novetta’s YARN Ready entity resolution and relationship dimension-building application.

The New Era of Analytics

Thomas Davenport, in his keynote at the Hadoop Summit San Jose 2014, said that the big data analytics has entered a new phase: From Analytics 2.0 to 3.0.…

Modern retailers collect data from a multitude of consumer engagement channels, including point of sale systems, the web, mobile applications, social media, and more. They hope to use this data to derive greater customer insights, promote increased brand engagement and loyalty, optimize pricing and promotions, streamline the supply chain, and enhance their business models.

Data from the retailer’s transactional systems has historically been stored in an enterprise data warehouse (EDW) or other database, but these traditional data repositories are not well suited for the newer, unstructured data types like log files, social media updates and information from in-store sensors.…

StackIQ, a Hortonworks technology partner, offers a comprehensive software suite that automates the deployment, provisioning, and management of Big Infrastructure. In his second guest blog, Anoop Rajendra (@anoop_r), a Senior Software Developer at StackIQ, gives instructions for using StackIQ Comand Line Interface (CLI) to deploy a Hortonworks Data Platform (HDP) cluster.

In a previous blog post, we discussed how StackIQ’s Cluster Manager automates the installation and configuration of an Apache Ambari server.…

Thanks to all who joined us on our Hortonworks/Voltage webinar, “Securing Hadoop: What are Your Options?” For those who couldn’t attend, we’re sorry we missed you. We’ve included a link to the webinar recording below, and please listen in!

On the webinar, Hortonworks’ Vinod Nair presented the recently-announced Apache Argus incubator: a central policy administration framework across security requirements for authentication, authorization, auditing and data protection. Sudeep Venkatesh, of Voltage Security, defined data-centric protection technologies that easily integrate with Hive, Sqoop, MapReduce and other Hadoop interfaces.…

Hortonworks and Informatica have teamed up to provide the data systems and tools making up the foundation of the modern data architecture. Today, Scott Hedrick, Director of Big Data Partnerships at Informatica, tells us more about the brand new Informatica Big Data Edition Trial Sandbox for Hortonworks

With the help of our friends at Hortonworks, the Informatica Big Data team has preinstalled a 60-day trial version of the Informatica Big Data Edition into the Hortonworks Sandbox.…

Speed, Scale, and SQL Semantics

Since its inception and graduation as a Top Level Project (TPL) from Apache Foundation Project (ASF) in September 2010, Apache Hive has been steadily improving—in speed, scale, and SQL semantics—to meet enterprise requirements for both interactive and batch queries at Hadoop scale.

It has become a defacto standard for SQL queries over petabytes of data stored in Hadoop. It is a compliant SQL engine that offers familiarity to developers over a comprehensive and familiar set of SQL semantics for Apache Hadoop.…

In this partner guest blog, Microsoft Principal Software Development Engineer Eric Hanson weighs in how Stinger.next will benefit HDInsight customers. Coming from someone who worked on Microsoft SQL Server for years and is a committer to Apache Hive, Eric explains that Stinger.next initiatives and capabilities are essential to take Hive to the next level.

Apache Hive is one of the most-used features of Microsoft’s cloud Hadoop service, Azure HDInsight. So our HDInsight customers of course will enjoy new capabilities that make Hive faster.…

What drives successful implementations of big data analytics projects? Hortonworks’ Director of Data Science,

Ofer Mendelevitch, Director of Data Science, Hortonworks

Ofer Mendelevitch, teams up with Zementis’ Founder and CEO Michael Zeller to discuss their learnings from working with dozens of companies from small cloud-based start-ups to some of the largest companies in the world.

Register here for the webinar on September 10 at 10am Pacific Time.

Hortonworks will present their approach to using Apache Hadoop for predictive models with big data, and the benefits of Hadoop to data scientists.…

Go to page:12345...102030...Last »