cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

Hadoop Ecosystem

We were really excited to welcome a sold out crowd at the first Hadoop Summit in Tokyo last week.  This was a fantastic response, based on the huge interest around a technology that is transforming industries across Asia and Pacific. We could not put this kind of conference on without the help of our sponsors […]

We recently hosted a webinar on the topic of  HDF 2.0 and the integration between Apache NiFi, Apache Ambari and Apache Ranger.  We thought we would share the questions & answers from the webinar, and also compile relevant data into a single place to make it easy to find and reference. Should you have any […]

Guest author: Jeff Kelly, Data Strategist, Pivotal The phrase “digital transformation” gets bandied about a lot these days, but what exactly does it mean? When you strip away the hyperbole, I believe digital transformation is the process by which enterprises evolve from using traditional information technology to merely support existing business models to adopting modern […]

Provenance, Lineage & Chain of Custody The models of Provenance, Lineage and Chain of Custody are used in fine art to determine when a piece was created, the sequence of locations where it was held, how it was touched along the way, and who has owned it since creation, all with the purpose of authenticating the piece. […]

People often think about cloud architecture in simplistic terms: you’re either public, private, or hybrid. (In fact, there’s even confusion about the meaning of the term “hybrid” itself—this video helps clear it up: In the real world, of course, virtually every implementation is hybrid—no company puts 100% of its IT environment into one single cloud. […]

Apache Hive(™) is the most complete SQL on Hadoop system, supporting comprehensive SQL, a sophisticated cost-based optimizer, ACID transactions and fine-grained dynamic security. Though Hive has proven itself on multi-petabyte datasets spanning thousands of nodes many interesting use cases demand more interactive performance on smaller datasets, requiring a shift to in-memory. Hive 2 marks the […]

The Financial regulators are driving a Data Evolution Traditionally technology moves fast, regulators react slow. When technology leaps forward, it enables financial firms to change the nature of their business – often into un-regulated territory; Regulators react to pass regulation to catch up. This model can work in slow moving markets, but in todays interconnected […]

As enterprises around the world bring more of their sensitive data into Hadoop data lakes, balancing the need for democratization of access to data without sacrificing strong security principles becomes paramount. According to a recent research report by Securosis, “Hadoop has (mostly) reached security parity with the relational platforms of old, and that’s saying a […]

User Interface and User Experience are some of the most important aspects of developing a product. No matter how many amazing features something has, a user must be able to access them in order to reap the full benefits of the product. For example, in the Apache Ambari Web UI, add-on apps called Views have, […]

Big data is changing the way enterprises interact with and consume data. Modern data platforms, such as Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF), are driving a data revolution by powering new workloads and analytic applications. This week, there are thousands of attendees in San Jose at Hadoop Summit 2016 learning about the […]

Water, water everywhere, Nor any drop to drink These lines from “The Rime of the Ancient Mariner,” by Samuel Taylor Coleridge also accurately describe the companies that are trying to transform themselves into a data driven company. These organizations have astronomical volumes of raw data at their disposal but how do they find that proverbial […]

“The world is one big data problem.”  Andrew McAfee, associate director of the Center for Digital Business at MIT Sloan One whole year of almost daily client meetings & discussions with industry leaders have helped me see crystallize my view of an important yet abstract idea into reality.  That is, Big Data capabilities or the lack of […]

With the growing volumes of diverse data being stored in the Data Lake, any breach of this enterprise-wide data can be catastrophic, from privacy violations and regulatory infractions to corporate image and long-term shareholder value. Seshu Adunuthula – Head of Analytics Infrastructure, eBay acting as Track Chair for Governance and Security for Hadoop Summit San Jose has […]

Debugging distributed systems can be difficult largely because they are designed to run on many (possibly thousands) of hosts in a cluster. This process typically involves monitoring and analyzing log files spread across the cluster, and if the necessary information is not being logged, service restarts and job redeployment may be required. Not only is […]

There were a lot of great activities and sessions at the recent Apache: Big Data North America in Vancouver, B.C. I enjoyed the technical level of the sessions and meeting others who contribute to projects in the Apache Software Foundation (ASF). The sessions I went to had a high level of interesting technical content, with […]