The Hortonworks Blog

Posts categorized by : Security

Last week was a busy week for shipping code, so here’s a quick recap on the new stuff to keep you busy over the holiday season.

Apache Sqoop is a tool that transfers data between the Hadoop ecosystem and enterprise data stores. Sqoop does this by providing methods to transfer data to HDFS or Hive (using HCatalog). Oracle Database is one of the databases supported by Apache Sqoop. With Oracle Database, the database connection credentials are stored in Oracle Wallet. Oracle Wallet can act as the store of keys and secrets such as authentication credentials. This post describes how Oracle Wallet adds a secure authentication layer for Sqoop jobs.…

Just yesterday, we talked about our roadmap for Security in Enterprise Hadoop. At our Security labs page you can see in one place the security roadmap and efforts underway across Hadoop and their timelines.

Security is often described as rings of defense. Continuing this analogy the Apache community has been working to create a perimeter security solution for Hadoop. This effort is Apache Knox Gateway (Apache Knox) and we are happy to announce the Technical Preview of Apache Knox.…

Security is a top agenda item and represents critical requirements for Hadoop projects. Over the years, Hadoop has evolved to address key concerns regarding authentication, authorization, accounting, and data protection natively within a cluster and there are many secure Hadoop clusters in production. Hadoop is being used securely and successfully today in sensitive financial services applications, private healthcare initiatives and in a range of other security-sensitive environments. As enterprise adoption of Hadoop grows, so do the security concerns and a roadmap to embrace and incorporate these enterprise security features has emerged.…

The Apache Knox community announced the release of the Apache Knox Gateway (Incubator) 0.3.0. We, at Hortonworks, are excited about this announcement.

The Apache Knox Gateway is a REST API Gateway for Hadoop with a focus on enterprise security integration.  It provides a simple and extensible model for securing access to Hadoop core and ecosystem REST APIs.

Apache Knox provides pluggable authentication to LDAP and trusted identity providers as well as service level authorization and more.  …

Security is one of the biggest topics in Hadoop right now. Historically Hadoop has been a back-end system accessed only by a few specialists, but the clear trend is for companies to put data from Hadoop clusters in the hands of analysts, marketers, product managers or call center employees whose numbers could be in the hundreds or thousands. Data security and privacy controls are necessary before this transformation can occur. HDP2, through the next release of Apache Hive introduces a very important new security feature that allows you to encrypt the traffic that flows between Hadoop and popular analytics tools like Microstrategy, Tableau, Excel and others.…

The shift to a data-oriented business is happening. The inherent value in established and emerging big datasets is becoming clear. Enterprises are building big data strategies to take advantage of these new opportunities and Hadoop is the platform to realize those strategies.

Hadoop is enabling a modern data architecture where it plays a central role: built to tackle big data sets with efficiency while integrating with existing data systems. As champions of Hadoop, our aim is to ensure the success of every Hadoop implementation and improve our own understanding of how and why enterprises tackle big data initiatives. …

Whether only beginning or well underway with Big Data initiatives, organizations need data protection to mitigate risk of breach, assure global regulatory compliance and deliver the performance and scale to adapt to the fast-changing ecosystem of Apache Hadoop tools and technology.

Business insights from big data analytics promise major benefits to enterprises – but launch of these initiatives also presents potential risks. New architectures, including Hadoop, can aggregate different types of data in structured, semi-structured and unstructured forms, perform parallel computations on large datasets, and continuously feed the data lake that enable data scientists to see patterns and trends.…

 

Back in the day, in order to secure a Hadoop cluster all you needed was a firewall that restricted network access to only authorized users. This eventually evolved into a more robust security layer in Hadoop… a layer that could augment firewall access with strong authentication. Enter Kerberos.  Around 2008, Owen O’Malley and a team of committers led this first foray into security and today, Kerberos is still the primary way to secure a Hadoop cluster.…

Introduction

Packetpig is the tool behind Packetloop. In Part One of the Introduction to Packetpig I discussed the background and motivation behind the Packetpig project and problems Big Data Security Analytics can solve. In this post I want to focus on the code and teach you how to use our building blocks to start writing your own jobs.

The ‘building blocks’ are the Packetpig custom loaders that allow you to access specific information in packet captures.…

Series Introduction

Packetloop CTO Michael Baker (@cloudjunky) made a big splash when he presented ‘Finding Needles in Haystacks (the Size of Countries)‘ at Blackhat Europe earlier this year. The paper outlines a toolkit based on Apache Pig, Packetpig @packetpig (available on github), for doing network security monitoring and intrusion detection analysis on full packet captures using Hadoop.

In this series of posts, we’re going to introduce Big Data Security and explore using Packetpig on real full packet captures to understand and analyze networks.…

Pre-crime? Pretty close…

If you have seen the futuristic movie Minority Report, you most likely have an idea of how many factors and decisions go into crime prevention. Yes, Pre-crime is an aspect of the future but even today it is clear that many social, economic, psychological, racial, and geographical circumstances must be thoroughly considered in order to make crime prediction even partially possible and accurate. The predictive analytics made possible with Apache Hadoop can significantly benefit this area of government security.…

Delegation tokens play a critical part in Apache Hadoop security, and understanding their design and use is important for comprehending Hadoop’s security model.

Download our technical paper on adding security to Hadoop here.

Authentication in Apache Hadoop
Apache Hadoop provides strong authentication for HDFS data. All HDFS accesses must be authenticated:

1. Access from users logged in on cluster gateways
2. Access from any other service or daemon (e.g. HCatalog server)
3.…

Overview
As the former technical lead for the Yahoo! team that added security to Apache Hadoop, I thought I would provide a brief history.

The motivation for adding security to Apache Hadoop actually had little to do with traditional notions of security in defending against hackers since all large Hadoop clusters are behind corporate firewalls that only allow employees access. Instead, the motivation was simply that security would allow us to use Hadoop more effectively to pool resources between disjointed groups.…

Go to page:12