The Hortonworks Blog

Posts categorized by : Apache Hadoop

Continuing our ecosystem momentum for the next generation of SQL in Hadoop, here to share his insights with us on the potential that Stinger.next holds for both the individual data worker as well as the data driven company alike is Dustin Smith, Product Marketing Manager at Tableau Software.

The work delivered over the last year as part of Stinger has made a tremendous impact for our customers who are using Tableau to analyze Hadoop data, and we are excited to see this momentum continue under the leadership of Hortonworks within the Apache Hive community. …

In case you missed it — earlier this week, Alan Gates and team provided some insights into Stinger.next roadmap around the delivery of Enterprise SQL and Hadoop Scale. We’re excited to continue the conversation and include some of our key partners around their excitement on this important initiative. Today’s guest blogger, Michael Hiskey, Chief Product Evangelist & Product Marketing, from MicroStrategy, provides some insight on the Stinger.next initiatives and how this will benefit MicroStrategy customers and the overall Big Data and Hadoop community.…

Big data growth continues to be a major consumer of enterprise IT resources with no end in sight. To gain value out of this data, organizations are creating new analytic applications for their business users. This growth in data and applications impacts hardware, networking and software resource consumption in the data center. CIOs are looking for ways to future proof their data management infrastructure and to be more efficient about how they monitor and manage their Apache Hadoop clusters.…

Apache Ambari is an open operational framework to provision, manage and monitor Hadoop clusters. As Hadoop has grown from a single purpose (MapReduce) framework to an extensible multi-purpose compute platform, with Apache Hadoop YARN as its architectural center, Apache Ambari has marched hand-in-hand to meet the evolving operational needs of Enterprise Hadoop.

Enabling ecosystem integration has been a key thrust of recent innovations within the Apache Ambari community. Key developments including Stack Extensibility and Ambari Views allow Ambari to deploy and manage YARN enabled applications.…

In April of this year, Hortonworks, along with the broad Hadoop community delivered the final phase of the Stinger Initiative on schedule, completing the work to bring interactive SQL query to Apache Hive.  The original directive of Stinger was about advancing SQL capabilities at petabyte scale in pure open source. And over 13 months, 145 developers from 44 companies delivered exactly that, contributing over 390,000 lines of code to the Hive project alone.…

Geoff Flood is president of T4G Limited and co-chair of the province of New Brunswick Research & Innovation Council. In this guest blog, Geoff elaborates on why “partnering with Hortonworks was simply a no-brainer for us. It’s a decision that will deliver prized and measurable value to our customers.”

Big data is more than just buzz; it’s a big deal. It’s changing everything in our lives and all around us. As president of a successful technology services firm in Canada, I knew we had to change, too, when it comes to designing, developing and implementing solutions for our customers across North America.…

Haohui Mai is a member of technical staff at Hortonworks in the HDFS group and a core Hadoop committer. In this blog, he explains how to setup HTTPS for HDFS in a Hadoop cluster.

1. Introduction

The HTTP protocol is one of the most widely used protocols in the Internet. Today, Hadoop clusters exchange internal data such as file system images, the quorum journals, and the user data through the HTTP protocol.…

We are excited to announce that Apache Kafka 0.8.1.1 is now available as a technical preview with Hortonworks Data Platform 2.1. Kafka was originally developed at LinkedIn and incubated as an Apache project in 2011. It graduated to a top-level Apache project in October of 2012.

Many organizations already use Kafka for their data pipelines, including Hortonworks customers like Spotify and Tagged.

What is Apache Kafka?

Apache Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system.…

Chaos Before The Storm … and a Brief History

For its name and the metaphoric image it evokes, Apache Storm lives up to its purpose and promise: to ingest, absorb, and digest an avalanche of real-time data as a stream of unbounded discrete events at scale, speed, and success.

Before Storm, developers used a set of queues and workers to process a stream of real-time events. That is, events were placed on a worker queues, and worker threads plucked events and processed them—transforming, persisting or forwarding them to another queue for further processing.…

Sheetal Dolas is a Principal Architect at Hortonworks. As part of Apache Storm design patterns’ series blog, he explores three options for micro-batching using Apache Storm’s core APIs. This is the first blog in the series.

What is Micro-batching?

Micro-batching is a technique that allows a process or task to treat a stream as a sequence of small batches or chunks of data. For incoming streams, the events can be packaged into small batches and delivered to a batch system for processing [1]

Micro-batching in Apache Storm

In Apache Storm, micro-batching in core Storm topologies makes sense for performance or for integration with external systems (like ElasticSearch, Solr, HBase or a database).…

YARN and Apache Storm: A Powerful Combination

YARN changed the game for all data access engines in Apache Hadoop. As part of Hadoop 2, YARN took the resource management capabilities that were in MapReduce and packaged them for use by new engines. Now Apache Storm is one of those data-processing engines that can run alongside many others, coordinated by YARN.

YARN’s architecture makes it much easier for users to build and run multiple applications in Hadoop, all sharing a common resource manager.…

The open source community, including Hortonworks, has invested heavily in building enterprise grade security for Apache Hadoop. These efforts include Apache Knox for perimeter security, Kerberos for strong authentication and the recently announced Apache Argus incubator that brings a central administration framework for authorization and auditing.

Join Hortonworks and Voltage Security in a webinar on August 27  to learn more.

In multi-platform environments with data coming from many different sources, personally identifiable information, credit card numbers, and intellectual property can land in the Hadoop cluster.…

This summer, Hortonworks presented the Discover HDP 2.1 Webinar series. Our developers and product managers highlighted the latest innovations in Apache Hadoop and related Apache projects.

We’re grateful to the more than 1,000 attendees whose questions added rich interaction to the pre-planned presentations and demos.

For those of you that missed one of the 30-minute webinars (or those that want to review one they joined live), you can find recordings of all sessions on our What’s New in 2.1 page.…

Zettaset is a Hortonworks partner. In this guest blog, John Armstrong, VP of Marketing at Zettaset Inc., shares Zettaset’s security features and explains why data encryption is vital for data in the Hadoop infrastructure.

Comprehensive Security Across the Hadoop Infrastructure

As big data technologies like Hadoop become widely deployed in production environments, the expectation is that they will meet the enterprise requirements in data governance, operations and security while integrating with existing data center infrastructure. …

With the release of Apache Hadoop YARN in October of last year, more and more solution providers are moving from single-application Hadoop clusters to a versatile, integrated Hadoop 2 data platform. This allows them to host multiple applications — eliminating silos, maximizing resources and bringing true multi-workload capabilities to Hadoop. 

That is why we’re  extremely excited to have Paul Kent, Vice President of Big Data at SAS, share his insights on the value of Apache Hadoop YARN and the benefits it brings to SAS and its users. …

Go to page:12345...102030...Last »