The Hortonworks Blog

Continuing our ecosystem momentum for the next generation of SQL in Hadoop, here to share his insights with us on the potential that Stinger.next holds for both the individual data worker as well as the data driven company alike is Dustin Smith, Product Marketing Manager at Tableau Software.

The work delivered over the last year as part of Stinger has made a tremendous impact for our customers who are using Tableau to analyze Hadoop data, and we are excited to see this momentum continue under the leadership of Hortonworks within the Apache Hive community. …

In case you missed it — earlier this week, Alan Gates and team provided some insights into Stinger.next roadmap around the delivery of Enterprise SQL and Hadoop Scale. We’re excited to continue the conversation and include some of our key partners around their excitement on this important initiative. Today’s guest blogger, Michael Hiskey, Chief Product Evangelist & Product Marketing, from MicroStrategy, provides some insight on the Stinger.next initiatives and how this will benefit MicroStrategy customers and the overall Big Data and Hadoop community.…

In this partner guest blog, John Haddad, senior director of product marketing at Infomatica, explains and enumerates how Stinger.next’s key innovations in Enterprise SQL at Hadoop scale will augment Informatica’s Big Data Edition integration with Hortonworks’ Modern Data Architecture.

Informatica is excited about the new innovations Hortonworks is including in the Stinger.next project such as Hive transactions, Hive-Spark integration, and sub-second queries. The Informatica Big Data Edition helps our customers take advantage of these new innovations without having to rebuild their data pipelines for Big Data analytics.…

Big data growth continues to be a major consumer of enterprise IT resources with no end in sight. To gain value out of this data, organizations are creating new analytic applications for their business users. This growth in data and applications impacts hardware, networking and software resource consumption in the data center. CIOs are looking for ways to future proof their data management infrastructure and to be more efficient about how they monitor and manage their Apache Hadoop clusters.…

Apache Ambari is an open operational framework to provision, manage and monitor Hadoop clusters. As Hadoop has grown from a single purpose (MapReduce) framework to an extensible multi-purpose compute platform, with Apache Hadoop YARN as its architectural center, Apache Ambari has marched hand-in-hand to meet the evolving operational needs of Enterprise Hadoop.

Enabling ecosystem integration has been a key thrust of recent innovations within the Apache Ambari community. Key developments including Stack Extensibility and Ambari Views allow Ambari to deploy and manage YARN enabled applications.…

In April of this year, Hortonworks, along with the broad Hadoop community delivered the final phase of the Stinger Initiative on schedule, completing the work to bring interactive SQL query to Apache Hive.  The original directive of Stinger was about advancing SQL capabilities at petabyte scale in pure open source. And over 13 months, 145 developers from 44 companies delivered exactly that, contributing over 390,000 lines of code to the Hive project alone.…

Geoff Flood is president of T4G Limited and co-chair of the province of New Brunswick Research & Innovation Council. In this guest blog, Geoff elaborates on why “partnering with Hortonworks was simply a no-brainer for us. It’s a decision that will deliver prized and measurable value to our customers.”

Big data is more than just buzz; it’s a big deal. It’s changing everything in our lives and all around us. As president of a successful technology services firm in Canada, I knew we had to change, too, when it comes to designing, developing and implementing solutions for our customers across North America.…

Haohui Mai is a member of technical staff at Hortonworks in the HDFS group and a core Hadoop committer. In this blog, he explains how to setup HTTPS for HDFS in a Hadoop cluster.

1. Introduction

The HTTP protocol is one of the most widely used protocols in the Internet. Today, Hadoop clusters exchange internal data such as file system images, the quorum journals, and the user data through the HTTP protocol.…

I can’t believe it’s been 6 months since we first announced for expanded strategic alliance with Red Hat. For those that have been following this partnership, you know our goal is simple — to help organizations adopt enterprise Apache Hadoop as part of their modern data architecture. Our expanded relationship with Red Hat is closely aligned around a strategy of innovating in the open and applying enterprise rigor to open source software, thereby de-risking it for the enterprise, and allowing faster adoption for Enterprise Apache Hadoop.…

We are excited to announce that Apache Kafka 0.8.1.1 is now available as a technical preview with Hortonworks Data Platform 2.1. Kafka was originally developed at LinkedIn and incubated as an Apache project in 2011. It graduated to a top-level Apache project in October of 2012.

Many organizations already use Kafka for their data pipelines, including Hortonworks customers like Spotify and Tagged.

What is Apache Kafka?

Apache Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system.…

Chaos Before The Storm … and a Brief History

For its name and the metaphoric image it evokes, Apache Storm lives up to its purpose and promise: to ingest, absorb, and digest an avalanche of real-time data as a stream of unbounded discrete events at scale, speed, and success.

Before Storm, developers used a set of queues and workers to process a stream of real-time events. That is, events were placed on a worker queues, and worker threads plucked events and processed them—transforming, persisting or forwarding them to another queue for further processing.…

Sheetal Dolas is a Principal Architect at Hortonworks. As part of Apache Storm design patterns’ series blog, he explores three options for micro-batching using Apache Storm’s core APIs. This is the first blog in the series.

What is Micro-batching?

Micro-batching is a technique that allows a process or task to treat a stream as a sequence of small batches or chunks of data. For incoming streams, the events can be packaged into small batches and delivered to a batch system for processing [1]

Micro-batching in Apache Storm

In Apache Storm, micro-batching in core Storm topologies makes sense for performance or for integration with external systems (like ElasticSearch, Solr, HBase or a database).…

Since our founding in mid-2011, our vision for Hadoop has been that “half the world’s data will be processed by Hadoop”. With that long-term vision in mind, we focus on the mission to establish Hadoop as the foundational technology of the modern enterprise data architecture that unlocks a whole new class of data-driven applications that weren’t previously possible.

We use what we call the “Blueprint for Enterprise Hadoop” for guiding how we invest in Hadoop-related open source technologies as well as enabling the key integration points that are important for deploying Enterprise Hadoop within a modern data architecture, on-premises or in the cloud, in a way that enables the business and its users to maximize the value from their data.…

YARN and Apache Storm: A Powerful Combination

YARN changed the game for all data access engines in Apache Hadoop. As part of Hadoop 2, YARN took the resource management capabilities that were in MapReduce and packaged them for use by new engines. Now Apache Storm is one of those data-processing engines that can run alongside many others, coordinated by YARN.

YARN’s architecture makes it much easier for users to build and run multiple applications in Hadoop, all sharing a common resource manager.…

The open source community, including Hortonworks, has invested heavily in building enterprise grade security for Apache Hadoop. These efforts include Apache Knox for perimeter security, Kerberos for strong authentication and the recently announced Apache Argus incubator that brings a central administration framework for authorization and auditing.

Join Hortonworks and Voltage Security in a webinar on August 27  to learn more.

In multi-platform environments with data coming from many different sources, personally identifiable information, credit card numbers, and intellectual property can land in the Hadoop cluster.…

Go to page:12345...102030...Last »