The Hortonworks Blog

We believe the fastest path to innovation is the open community and we work hard to help deliver this innovation from the community to the enterprise.  However, this is a two way street. We are also hearing very distinct requirements being voiced by the broad enterprise as they integrate Hadoop into their data architecture.

Take a look at the Falcon Technical Preview and the Data Management Labs.

Open Source, Open Community & An Open Roadmap for Dataset Management

Over the past year, a set of enterprise requirements has emerged for dataset management.  …

A recent survey conducted by the OpenStack foundation shows incredible adoption in the enterprise. Cost savings and operational efficiency stand out as the top business motivators that are driving broad adoption of OpenStack across industry verticals. It was of particular interest to see that roughly 30% of the deployments are in production. Above all, I was definitely not surprised to see Hadoop amongst the top 10 workloads on OpenStack.

Hadoop is the Perfect App for OpenStack

Many of our customers are looking towards Hadoop as a greenfield use case for OpenStack because Hadoop, unlike other enterprise applications, has very few legacy processes attached to it.…

In just a few years, interest in Hadoop has enjoyed a meteoric rise. It is everywhere… and it should be available everywhere.

Here at Hortonworks we have worked to provide the widest range of deployment options for Hadoop… from on-premises to the cloud, Linux and Windows, and from commodity server clusters to high-end appliances. Deployment options are critical to the adoption of Hadoop and a key factor to adoption.

Today, we add Ubuntu to the list of options we support for HDP 2.0.…

With businesses demanding faster and easier access to information in order to make reliable and smart decisions, in-memory processing is an emerging technology that is gaining the attention of businesses of all sizes and across industries. Kognitio, a Hortonworks Technology Partner, uses an in-memory technology solution to provide scalable compute power for rapid execution of complex analytical queries.

Join us for the webinar on December 10 at 10am PT / 1pm ET “The Modern Data Architecture: In-Memory and Hadoop – The New BI”

What is In-Memory Processing?

Recently, SAP and Hortonworks announced the next step in the relationship with SAP, where SAP resells and provided enterprise support for the Hortonworks Data Platform. Since then, we’ve been working together to showcase how SAP HANA + Hortonworks Data Platform provide “Instant Insight and Infinite Scale”. The combination of HANA and the Hortonworks Data Platform is a perfect match. SAP HANA uniformly amplifies the value of Big Data across this data fabric including large data sets that are stored in Hadoop.…

Hortonworks customers can now enhance their Hadoop applications with Elasticsearch real-time data exploration, analytics, logging and search features, all designed to help businesses ask better questions, get clearer answers and better analyze their business metrics in real-time.

Hortonworks Data Platform and Elasticsearch make for a powerful combination of technologies that are extremely useful to anyone handling large volumes of data on a day-to-day basis. With the ability of YARN to support multiple workloads, customers with current investments in flexible batch processing can also add real-time search applications from Elasticsearch.…

A consequence of living in a globalized, connected world  is the unfortunate presence of online fraud. Fraud applies to all industries and affects businesses of all sizes. Given that we’re coming up on the holidays, and specifically with North America’s love of Black Friday and Cyber Monday, this week we partnered with Datameer on a very topical discussion  about best practices on how to fight fraud using Hortonworks Data Platform to integrate Hadoop and Datameer.…

We have heard plenty in the news lately about healthcare challenges and the difficult choices faced by hospital administrators, technology and pharmaceutical providers, researchers, and clinicians. At the same time, consumers are experiencing increased costs without a corresponding increase in health security or in the reliability of clinical outcomes.

One key obstacle in the healthcare market is data liquidity (for patients, practitioners and payers) and some are using Apache Hadoop to overcome this challenge, as part of a modern data architecture.…

User logs of Hadoop jobs serve multiple purposes. First and foremost, they can be used to debug issues while running a MapReduce application – correctness problems with the application itself, race conditions when running on a cluster, and debugging task/job failures due to hardware or platform bugs. Secondly, one can do historical analyses of the logs to see how individual tasks in job/workflow perform over time. One can even analyze the Hadoop MapReduce user-logs using Hadoop MapReduce(!) to determine any performance issues.…

This is the second of two posts examining the use of Hive for interaction with HBase tables. This is a hands-on exploration so the first post isn’t required reading for consuming this one. Still, it might be good context.

“Nick!” you exclaim, “that first post had too many words and I don’t care about JIRA tickets. Show me how I use this thing!”

This is post is exactly that: a concrete, end-to-end example of consuming HBase over Hive.…

The closing date for submitting Speaking Abstracts for Hadoop Summit Europe is NOVEMBER 22.  If you are interested in speaking this year, we encourage you to submit your topic today.  The Call for Abstracts will close at midnight on November 22, 2013 (the final midnight on the planet… Kiribati time?).

Some notes about Summit

We will return to Amsterdam for Hadoop Summit Europe this year (April 2-3, 2014). Also, we have restructured the content for the event to make sure we, as a community, continue to deliver high value technical content.…

Join Hortonworks and Pactera for a Webinar on Unlocking Big Data’s Potential in Financial Services Thursday, November 21st at 12:00 EST.

Have you ever had your debit or credit card declined for seemingly no reason? Turns out, the rejections are not so random. Banks are increasingly turning to analytics to predict and prevent fraud in real-time. That can sometimes be an inconvenience for customers who are traveling or making large purchases, but it’s necessary inconvenience today in order for banks to reduce billions in losses due to fraud.…

This is the first of two posts examining the use of Hive for interaction with HBase tables. The second post is here.

One of the things I’m frequently asked about is how to use HBase from Apache Hive. Not just how to do it, but what works, how well it works, and how to make good use of it. I’ve done a bit of research in this area, so hopefully this will be useful to someone besides myself.…

I teach for Hortonworks and in class just this week I was asked to provide an example of using the R statistics language with Hadoop and Hive. The good news was that it can easily be done. The even better news is that it is actually possible to use a variety of tools: Python, Ruby, shell scripts and R to perform distributed fault tolerant processing of your data on a Hadoop cluster.…

This post is authored by Omkar Vinit Joshi with Vinod Kumar Vavilapalli and is the ninth post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:

Introduction

In the previous post, we explained the basic concepts of LocalResources and resource localization in YARN.…

Go to page:« First...1011121314...203040...Last »