The Hortonworks Blog

Posts categorized by : Apache Hadoop

SequenceIQ is a new Hortonworks Technology Partner and recently achieved HDP and YARN Ready certification for Cloudbreak, the SequenceIQs Hadoop as a Service API. In this guest blog, SequenceIQ Co-founder and CTO Janos Matyas (@sequenceiq), describes provisioning and autoscaling HDP cluster with Cloudbreak.

During our daily work at SequenceIQ, we are provisioning HDP clusters on different environments. Be it for a random cloud provider or on bare metal, we were looking for a common solution to automate and speed up the process.…

Heading to Strata next week? Interested in learning more about Apache Hadoop and how to integrate your existing infrastructure and applications with your Big Data solution? Join our partner presentations at the Hortonworks booth #117, on Thursday, October 16 and Friday, October 17. Many of the sessions will feature demonstrations.

You will hear directly from partners who are embracing 100% open source Apache Hadoop, including: Actian, CISCO, CSC, HP, Informatica, Microsoft, Red Hat, Revolution Analytics, SAP, SAS, and Teradata.…

Syncsort is a certified Hortonworks Technology and YARN Ready Partner and our guest blogger. Here, Tendu Yogurtcu, vice president of engineering at Syncsort, expands on Syncsort’s recent news about their integration of DMX-h and Ambari.

As Apache Hadoop YARN has transformed Hadoop from being a data processing solution to being a true data processing platform, requirements for provisioning, managing, and securing the platform have changed dramatically.

Stability, security, easy deployment, performance, management and monitoring are among many of the key attributes that make a data management platform enterprise-grade.…

A panel of reviewers made up of InfoWorld Test Center editors and industry experts selected Apache Storm as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source Software every year. These Bossie awards celebrate game-changing open source software projects in different domains, and the panel selected Apache Storm in the Big Data Tools category.

This is the first year that a streaming computation framework has been selected in the Big Data category, which is a tribute to Apache Storm’s broad industry adoption and versatility. …

Apache Tez has been selected as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source software every year and are awarded by a panel of InfoWorld Test Center editors and industry expert reviewers. The Bossie awards celebrate game-changing open source software projects in different domains, and Apache Tez was selected in the Big Data Tools category.

Last year, Apache Hadoop with YARN as its architectural center was awarded a Bossie.…

Last week’s Hortonworks webinar “What’s Possible with a Modern Data Architecture?” featured Greg Girard, program director for omni-channel analytics strategies at IDC Retail Insights and Mark Ledbetter, vice president for industry solutions at Hortonworks. Greg provides targeted, fact-based guidance to retailers for the application of analytics across the enterprise. Mark has more than twenty-five years experience in the software industry with a focus on retail and supply chains.

Many of Greg and Mark’s thoughts from the webinar echo topics also covered in the recent Hortonworks white paper “The Retail Sector Boosts Sales with Hadoop.”

Download White Paper

Greg discussed the most significant drivers of big data initiatives in the retail industry, including customer acquisition, pricing strategies or competitive intelligence.…

At Hortonworks, we are always watching emerging trends in the datacenter to find opportunities for deeper ecosystem integration with Apache Hadoop in simple and intuitive ways. We first partnered with OpenShift by Red Hat earlier this year when we made it possible to call out to Hadoop services from OpenShift via cartridges. You can read more about that solution here. As Enterprise Cloud (e.g. PaaS) offerings have matured to support a broad set of workloads, we’ve had a number of our customers ask about how Hadoop-centered Big Data and PaaS initiatives could work together – particularly in light of Apache Hadoop YARN being the multi-workload resource manager for batch, interactive and real-time workloads on Hadoop.…

BlueData™, a new Hortonworks Certified Technology Partner, is a pioneer in Big Data private clouds that help enterprises create a self-service cloud experience on premise. BlueData has been recently certified with Hortonworks Data Platform (HDP).

BlueData’s Director of Business Development, Rashmi Gopinath, describes the interworking and advantages of the BlueData EPIC™ solution with HDP.

Last week BlueData announced the launch of EPIC™ Enterprise, a Big Data private cloud solution, and its subsequent certification on Hortonworks Data Platform.…

Hortonworks’ strategy, since our inception, has been extremely consistent: enable a modern data architecture whereby users have the ability to store data in a single location and interact with it in multiple ways – using the right data processing engine at the right time.  At the core of that strategy is YARN, which as a part of Apache Hadoop, allows multiple data processing engines to interact with data stored in a single platform, unlocking an entirely new approach to analytics.…

Concurrent Inc. is a Hortonworks Technology Partner and recently announced that Cascading 3.0 now supports Apache Tez as an application runtime. Cascading is a powerful development framework for building enterprise data applications on Hadoop and is one of the most widely deployed technologies for data applications, with more than 175,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in data application development on Hadoop.…

Internet of Things (IoT) Potential and Process

It may seem obvious (or inevitable), but many companies are embracing the Internet of Things (IoT)—and for good reasons, notes Forbes’ Mike Kavis. For one, McKinsey Global Institute reports that IoT business will reach $6.2 trillion in revenue by 2025. And second, more and more objects are becoming embedded with sensors that communicate real-time data to data centers’ networks for processing, explain McKinsey’s Chui, Loffler, and Roberts.…

On September 17, the Apache Software Foundation (ASF) voted to graduate Apache Storm to a top-level project (TLP). This represents a major step forward for the project and represents the momentum built by a broad community of developers from not only Hortonworks, but also Yahoo!, Alibaba, Twitter, Microsoft and many other companies.

What is Apache Storm and why is it useful?

Apache Storm is a distributed, fault tolerant, and highly scalable platform for processing streaming data.…

ITC Infotech is a Hortonworks consulting and integration partner and provides IT services and solutions to leading global customers. The company addresses a wide range of customer challenges through innovative IT solutions.

Today, guest blogger Aditya Agrawal, head of Advance technology, ZLabs at ITC Infotech focuses on ITC’s RADAR framework for the Retail industry.

STORM and SOLR are excellent examples of new Hadoop tools that enable new use cases that were pretty hard to implement before.…

The Apache Tez community is thrilled to announce the release of version 0.5 of the project. We’re referring to this as “the developer release” because it’s all about developers. The community focused on meeting the key needs of developers using Tez to create their applications and engines. Tez 0.5 includes clean and intuitive developer APIs, easy debugging, extensive documentation and deployment with rolling upgrades.

Apache Hadoop YARN paved the way for Apache Tez.…

Summary

This blog covers how recent developments have made it easy to use ORCFile from Cascading or Apache Crunch and that doing so can accelerate data processing more than 5x. Code samples are provided so that you can start integrating ORCFile into your Cascading or Crunch projects today.

What are Cascading and Apache Crunch?

Cascading and Apache Crunch are high-level frameworks that make it easy to process large amounts of data in distributed clusters.…

Go to page:12345...102030...Last »