Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
December 17, 2013
prev slideNext slide

Wire Encryption in Hadoop

Encryption is applied to electronic information in order to ensure its privacy and confidentiality.  Typically, we think of protecting data as it rests or in motion.  Wire Encryption protects the latter as data moves through Hadoop over RPC, HTTP, Data Transfer Protocol (DTP), and JDBC.

Let’s cover the configuration required to encrypt each of these protocols. To see the step-by-step instructions please see the HDP 2.0 documentation.

RPC Encryption

The most common way for a client to interact with a Hadoop cluster is through RPC.  A client  connects to a NameNode (NN) over RPC protocol to read or write a file. RPC connections in Hadoop use Java’s Simple Authentication & Security Layer (SASL) which supports encryption. When property is set to privacy the data over RPC is encrypted with symmetric keys. Please refer to this post for more details on setting.

Data Transfer Protocol

The NN gives the client the address of the first DataNode (DN) to read or write the block. The actual data transfer between the client and a DN is over Data Transfer Protocol. To encrypt data transfer you must set on NN and all DNs. The actual algorithm used for encryption can be customized with set to either 3des or rc4. If nothing is set, then the default on the system is used (usually 3DES.) While 3DES is more cryptographically secure, RC4 is substantially faster.

HTTPS Encryption

Encryption over the HTTP protocol is implemented with the support for SSL across a Hadoop cluster. For example, to enable NN UI to listen for HTTP over SSL you must configure SSL on the NN and all the DNs by setting dfs.https.enable=true in hdfs-site.xml. Typically SSL is configured to only authenticate the Server-this is called 1-way SSL. In addition, SSL can also be configured to authenticate the client-this is called mutual authentication or 2-way SSL. To configure 2-way SSL set dfs.client.https.need-auth=true in hdfs-site.xml on each NN and DN. For 1-way SSL only the keystore needs to be configured on the NN and DN. The keystore & the truststore configuration go in the ssl-server.xml and ssl-client.xml file on the NN and each DN. The truststore configuration is only needed when using a self-signed certificate or a certificate that is not in the JVM’s truststore.

The following configuration properties need to be specified in ssl-server.xml.


Default Value




The type of the keystore, JKS = Java Keystore, the de-facto standard in Java



The location of the keystore file



The password to open the keystore file



The type of the trust store



The location of the truststore file



The password to open the truststore

Encryption during Shuffle

Staring HDP 2.0 encryption during shuffle is supported.

The data moves between the Mappers and the Reducers over the HTTP protocol, this step is called shuffle. Reducer initiates the connection to the Mapper to ask for data and acts as SSL client. Enabling HTTPS for encrypting shuffle traffic involves the following steps.

  • Set mapreduce.shuffle.ssl.enabled to true in mapred-site.xml

  • Set keystore properties and optionally truststore (for 2-way SSL) properties mentioned in the above table.

Here is an example configuration from mapred-site.xml


The above configuration refers to a ssl-server.xml and ssl-client.xml. These files will contain properties as specified in the table above. Make sure to put ssl-server.xml and ssl-client.xml in the default ${HADOOP_CONF_DIR}.


HiveServer2 implements encryption with Java SASL protocol’s quality of protection (QOP) setting. With this the data moving between a HiveServer2 over jdbc and a jdbc client can be encrypted. On the HiveServer2, set hive.server2.thrift.sasl.qop in hive-site.xml, and on the JDBC client specify sasl.sop as part of jdbc hive connection string. eg jdbc:hive://hostname/dbname;sasl.qop=auth-int. HIVE-4911 provides more details on this enhancement.

Closing Thoughts

Ensuring confidentiality of the data flowing in an out of a Hadoop cluster requiring configuring encryption on each channel that is being used to move the data. The blog describes encryption configuration required for encryption for various channels.

Please send me any comments about this post and any topic you would like me to cover. Stay tuned for the next post  about authorization in Hadoop. And you can stay up-to-date on Security innovation in Hadoop via our Labs Page.



Hari Sekhon says:
Your comment is awaiting moderation.

You’re missing the p in in the Data Transfer Protocol section

ben hammadi houcem says:

hello i am a student i’m doing my project on hadoop .i have some questions about encryption in hadoop, i want to know how to encrypt the data in transit and especially how to test that the data is encrypted or not during transition and thank you .

Dropsecure says:
Your comment is awaiting moderation.

Thank you for providing this useful post which is really helpful for me in encryption side. Your article was quite awesome to read.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums