newsletter

Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 02, 2018
prev slideNext slide

Apache Hadoop Meetup July 2018 – Bangalore Chapter

If you’re interested in learning more, go to our recap blog here!

Meetup Link:

https://www.meetup.com/Bangalore-Hadoop-Meetups/events/252534327/

The Bangalore Apache Hadoop Meetup group, with over 3400 members who share common interests and ideas in the Hadoop ecosystem, brings together a community of practitioners and developers at Bangalore. The talks at this meetup cover a variety of topics related to the Hadoop ecosystem, such as Data Science workloads, Big Data-Driven Applications, SQL on Hadoop, YARN workloads — the list is endless.

The most recent meetup of the group was held on July 28, 2018 at LinkedIn, Bangalore. Over a hundred enthusiastic participants attended this meetup on a busy weekend. The Twitter feed about the meetup was also busy with a lot of tweets. You can seen them here:  

https://twitter.com/hashtag/HadoopMeetup2018?src=hash

More Details:

Talk 1 Ozone: Object Store in Apache Hadoop

The meetup kick started with an overview of the Ozone File system. Mukul Kumar Singh and Nandakumar from Hortonworks shared a quick overview of the new Ozone File System (HDDS) and its capabilities.

One of the major challenges with HDFS is managing small files, which can also affect scalability. Ozone, which belongs to the Apache Hadoop ecosystem, is an object store to help address the scalability issues in HDFS. Ozone is also HDFS-compatible, and therefore, downstream projects can use it without any client modifications.

Talk 2 Sorcerer – Myntra’s Self-Serve Data Ingestion Platform

Deepak Batra from Myntra (one of the largest online shopping platforms for fashion and lifestyle in India) presented an overview of Sorcerer — the latest Data Ingestion Platform which is running on the production clusters at Myntra.

Sorcerer uses Apache Gobblin as core of the Data Ingestion framework that runs on the Apache Hadoop YARN resource management platform. It also uses Debezium, an open source distributed platform to capture data changes) for MySQL, which is over 20 million in an hour. Sorcerer also uses Hive Metastore for Data Discovery with its compaction and snapshot features.

Myntra’s complete data ingestion as of today is done by using Sorcerer. More features are being planned for its querying capabilities in the near future.

Talk 3 Scaling and Managing Capacity for the Linkedin grid ecosystem

Linkedin has one of the largest Hadoop clusters in production. Rahul Jain from LinkedIn gave an impressive overview  of LinkedIn clusters and how they use the cluster to power various cool LinkedIn features such as People You May Know and Linkedin Learning.

Rahul introduced us to the cluster that runs Azkaban, a batch workflow job scheduler created at LinkedIn for Hadoop jobs on YARN platform.

Rahul shared use cases that are essential for a cluster administrator and showcased a new user interface that extracts complex metrics from a Hadoop cluster. This UI collects various cluster metrics from components such as YARN, History Server, and HDFS, and correlates them on a dashboard. This dashboard, named GridView, provides an intuitive user experience that helps cluster administrators  understand how their clusters are running at any given point in time and provide relevant answers to pressing questions such as “Why is my job running slow today?”

Talk 4 Implementation and Performance Impact of Join Order, Dynamic Filter, and Cost Estimation of Queries in Presto

Rajat Venkatesh from Qubole presented an interesting talk on the performance impact of various join statements in Presto.

In this talk, Rajat covered various types of joins and optimizing them using Dynamic Filtering. Dynamic filtering leads to about 30% improvement in performance as columns that are not present in a table can be filtered from another related table at run time.

Talk 5 Apache Hadoop 3 Insights and Migrating your Clusters from Hadoop 2 to Hadoop 3 by Sunil Govindan (@sunilgovind) and Rohith Sharma K S (@rohithsharmaks) from Hortonworks

We presented a detailed overview of Hadoop 3 features that are available in the Hadoop 3.1 release. We also provided an informative preview of the upcoming features in YARN.

Quick Overview:

  • Erasure Coding support in HDFS for efficient storage.
  • YARN Federation to scale the cluster to 100,000 nodes and beyond.
  • Better placement strategies support in Capacity Scheduler to let users express complex resource specifications like affinity, anti-affinity, cardinality, and so on.
  • Docker support in launching containers for better isolation and packaging.
  • YARN Native Services support to run long-running services such as LLAP.

In this session, we also covered the migration use case from Hadoop 2 clusters to Hadoop 3. This intends to help users who plan to migrate their clusters to Hadoop 3 and use the latest features. To address the challenges associated with migration of platforms, we provided a detailed upgrade plan with the configuration, shell script, and command changes that help in simplifying  upgrade process.

We recommend Express Upgrade to migrate to Hadoop 3.

Slideshare link:

https://www.slideshare.net/SunilG11/apache-hadoop-3-updates-with-migration-story

Summary:

All the sessions that we attended were informative and diversified. There were very  good ‘industry-wide’ discussions with all the speakers and participants who shared their experiences of running Hadoop clusters in production environment for various workloads and also provided everyone with more use cases to solve for the future.

With a delicious lunch arranged by Linkedin Bangalore team, we said goodbye to each other until we meet for the next Hadoop Meetup at Bangalore!

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums