Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
December 15, 2014
prev slideNext slide

Discover HDP 2.2: Data Storage Innovations in Hadoop Distributed Filesystem (HDFS)

On November 13th, Hortonworks presented the fourth of 8 Discover HDP 2.2 webinars: Rohit Bakhshi, Jitendra Pandey, and Justin Sears hosted this 4th webinar in the series.

Rohit Bakhshi and Jitendra Pandey introduced HDP and discussed how to use HDFS for reliable, scalable, cost-efficient, and fault tolerant as a distributed data storage platform for your Modern Data Architecture (MDA). They also covered new HDFS data storage innovations now included in HDP 2.2:

  • Heterogeneous storage
  • Encryption
  • Operational security enhancements

Here is the complete recording of the Webinar.

Here are the presentation slides SlideShare.

And register for all remaining webinars in the series.

We’re grateful to the many participants who joined the HDP 2.2 webinar and asked excellent questions. This is the complete list of questions with their corresponding answers:

Question Answer
What is a typical configuration for a commodity server i.e. cpu, memory, storage? Our documentation covers recommended node hardware and cluster configurations:

What is a SSD? SDD stands for Solid State Drive.
Is it possible to migrate data from a hot to warm to cold storage policy during the data life cycle? You can apply a different storage policy to a directory, and then run the HDFS Mover tool. The tool will migrate the data to the storage tier according to the new storage policy.
How do you decrypt an HDFS data block? The encryption and decryption is handled by the HDFS client logic – as long as the user has permissions to that file.
Also, is it possible to set a default storage policy, which gets applied, to all files being created? And is this setting available as a configuration setting or only as an API?

In HDFS, the default storage type applied to DataNode disks if nothing is specified is ‘DISK’. The default storage policy applied is to store all replicas of a file on drives of type ‘DISK’.

This default is not configurable.

Is it possible to encrypt portion of the record (in a file)? HDFS File encryption provides encryption of an entire file written to HDFS.
What is an example of a use case that would benefit from this? HDFS Encryption at Rest enables industries that need to store PII data or meet HIPAA compliance utilize HDFS to store this sensitive data.
What encryption algorithms are used? AES-CTR is supported. AES-CTR allows seeks within the encrypted file without decryption.
When data is encrypted data at rest, what happens when someone does cat on a file in HDFS? How does it work? The ‘cat’ operation on a file is done through the HDFS CLI client. If the user that has issued the ‘cat’ operation on the file has permission for the encryption key and the file, then the HDFS client automatically de-crypts the file and shows the content as a result of the ‘cat’ operation.
As a client how do encryption zones work? Clients set the type of encryption zones for a specified directory, which is recorded at the Namenode. Thereafter, any subsequent writes to files in this directory will be encrypted using that encryption.

Also important to note that while there exists a single encryption key for the entire zone, you can still have encryption key per file as well, both maintained by the Key Management Server.

Will encryption be supported in the Kerborized clusters in HDP 2.2 GA?
How does this new functionality introduced in the Namenode affect its performance in its utilization?

This a good question.

The RAM utilization of the Namenode, encryption and decryption of data block, and the communication of messages between Namenode and Key Management Server might mildly increase; however, collectively they offset the cost of operational security with assurance of secured data at rest.

Nonetheless, this is a tech preview, and we are working on to decrease the compute and RAM utilization with an improved, scalable, and optimized Key management server.

As we approach GA, we are focused and diligently working on ensuring that we provide an efficient and scalable Key Management Server.

What version of the Apache Hadoop will have features as part of HDP 2.2? It’ll be Apache Hadoop 2.6.
Will you support mixed storage configuration of SDDs, Memory, and Disks for my Datanodes? All modes of configuration are supported. That is, you may have a Datanode made of only SDDs or combination of all three types, including the archival disk types, in numbers.

We want to be able to support mixed scenarios of the hardware you wish to configure for your cluster. In the case where you want some Datanodes for only archival disks for cold data, it’s is possible for that Datanode to have only archival disks types.

Do you support a scenario where some Datanodes that require root access while other Datanodes don’t? With the introduction of SASL, the need for Datanode to start as root will be deprecated. So it’s possible to have some Datanodes start as non-root.
What encryption protocols are supported in HDFS over the wire? We always supported encryption for data in motion between client and Datanode with SASL. Now with data at rest in which use SASL, we get data quality modes: integrity, privacy and security. By privacy we mean that data will be completely integrated on the wire.

Visit these pages to learn more:

Tags:

Comments

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>