Big Data Hadoop Training with Project In Bangalore

Learn to store, manage, retrieve and analyze Big Data on clusters of servers using the Hadoop eco-system Become one of the most in-demand IT professional in the world today Don't just learn Hadoop development also learn Hadoop Testing and how to analyze large amounts of data to bring out insights Relevant examples and cases make the learning more effective and easier Gain hands-on knowledge through the problem solving based approach of the course along with working on a project at the end of the course.

OFFER Get Hadoop Administration course absolutely free. Enroll Now

  • About the courses
  • Curriculum
  • FAQ's
  • Certification
  • Review

About the Course

The Big Data and Hadoop training course from NPN Training is designed to enhance your knowledge and skills to become a successful Hadoop Developer,Hadoop Tester & Hadoop Analyst.


At NPN Training we believe in the philosophy "Learn by doing" hence we provide complete

Hands-on training with a real time project development.


Course Objectives

By the end of the course,  you will: 

1.     Understand Hadoop 1.x & 2.x Architecture.
2.     Setup Hadoop Cluster and write Complex MapReduce programs. 
3.     Learn different Hadoop Commands.
4.     Data Loading techniques using Sqoop.
5.     Perform data analytics using Pig, Hive and YARN .
6.     Understand NoSQL & HBase.
7.     Implement best practices for Hadoop development. 


Work on a real life project on Big Data Analytics

As part of the course work, you will work on the below mentioned projects,where you will be using PIG, HIVE, HBase  and MapReduce to perform Big Data analytics.
Following are a few industry-specific Big Data case studies that are included in our Big Data and Hadoop Certification
e.g. Security Agency, Retail, Banking, Education, Media, Health care etc.


Project #1 : Analysis of Afghan War Dairies

Industry : Security Agency

The data comprises information gathered by soldiers and Intelligent officers of United States Military to examine events that involve explosive hazards and to find events that involve Improvised Explosive Devices (IEDs).


Project #2 : Customer Complaints Analysis about Products

Industry : Retail

Publicly available dataset, containing a few lakh observations with attributes like; CustomerId, Payment Mode, Product Details, Complaint, Location, Status of the complaint, etc. 
Problem Statement: Analyze the data in the Hadoop ecosystem to:
1. Get the number of complaints filed under each product
2. Get the total number of complaints filed from a particular location
3. Get the list of complaints grouped by location which has no timely response


Project #3 : Credit card Analysis

Industry : Banking

XYZ Bank is an Indian multinational banking and financial services company headquartered in Delhi, India. XYZ is a financial institution that provides various financial services, such as accepting deposits, issuing Credit Cards and loans. XYZ bank has range of investment products that offer like savings accounts and certificates of deposit. It offers a wide range of banking products and financial services for corporate and retail customers through a variety of delivery channels and specialised subsidiaries in the areas of investment banking, life, non-life insurance, venture capital and asset management.


Project #4 : Scholastic Assessment Analysis

Industry : Education

This data set is SAT (College Board) 2010 School Level Results which gives you the information about how the students perform in the tests from different schools. It consists of the below fields.
DBN, School Name, Number of Test Takers, Critical Reading Mean, Mathematics Mean, Writing Mean
Here DBN will be the unique field for this dataset. The students were given a test. Based on the results from the test.

Here we are trying to analyze this data and below are the few problem statements that we have chosen:
1. Find the total number of test takers.
2. Find the highest mean/average of the Critical Reading section and the school name.
3. Find the highest mean/average of the Mathematics section and the school name
4. Find the highest mean/average of the Writing section and the school name


Project #5 : Processing Movielens dataset using Pig

Industry : Entertainment

In this project, we will learn about Apache Pig and how to use it to process the Movielens dataset. We will get familiar with the various Pig operators used for data processing. We will cover how to use UDFs and write your own custom UDFs. Finally we will take a look at diagnostics and performance tunning.

Project #7 : Health care Analysis

Industry : Health care

Below are few of the problem statement that we have chosen to work on this dataset.
1.How many hospital centres got more than 60% patient satisfaction regarding cleanliness?
2.Which hospital centre got maximum overall rating between 9-10?





Hadoop 2.x - Distributed Storage + Batch Processing


Module 01 - Understanding Big Data & Hadoop 2.x +
Learning Objectives - In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop 2.x Architecture, HDFS, Anatomy of File Write and Read.

Topics -
  • Understanding what is Big Data
  • Combined storage + computation layer
  • Bussiness Usecase - Telecom
  • Challenges of Big Data
  • OLTP VS OLAP Applications
  • Limitations of existing Data Analytics
  • A combined storage compute layer
  • Introduction to Hadoop
  • Exploring Hadoop 2.x Core Components
  • Understanding Hadoop 2.x Daemon Services
    1. NameNode
    2. DataNode
    3. Secondary NameNode
    4. ResourceManager
    5. NodeManager
  • Understanding NameNode metadata
  • File Blocks in HDFS
  • Rack Awareness
  • Anatomy of File Read and File Write
  • Understanding HDFS Federation
  • Understanding High Availablity Feature in Hadoop 2.x
  • Exploring Big Data ecosystem

View Module Presentation

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies


Module 02 - Exploring Administration + File System + YARN Commands  +
Learning Objectives - In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

Topics -
  • Analyzing ResourceManager and NameNode UI
  • Exploring HDFS File System Commands - [Hands-on]
  • Exploring Hadoop Admin Commands - [Hands-on]
  • Printing Hadoop Distributed File System
  • Running Map Reduce Program - [Hands-on]
  • Killing Job
  • Data Loading in Hadoop - [Hands-on]
    1. Copying Files from DFS to Unix File System
    2. Copying Files from Unix File System to DFS
    3. Understanding Parallel copying of data to HDFS - [Hands-on]
  • Executing MapReduce Jobs
  • Different techniques to move data to HDFS - [Hands-on]
  • Backup and Recovery of Hadoop cluster - [Activity]
  • Commissioning and Decommissioning a node in Hadoop cluster. - [Activity]
  • Understanding Hadoop Safe Mode - Maintenance state of NameNodeKey/value pairs - [Hands-on]
  • Configuring Trash in HDFS - [POC]

Check E-Learning for more Assignments + Use cases + Project work + Materials + Case studies


Module 03 - MapReduce Programming  +
Learning Objectives -In this module, you will understand how MapReduce framework works.

Topics -
  • Introduction to MapReduce
  • Understanding Key/Value in MapReduce
    1. What it means?
    2. Why key/value data?
  • Hadoop Topology Cluster
    1. The 0.20 MapReduce Java API
    2. The Reducer class
    3. The Mapper class
    4. The Driver class
  • Flow of Operations in MapReduce
  • Implementing Word Count Program - [Hands-on]
  • Exploring Default InputFormat - TextInputFormat
  • Submission & Initializing of MapReduce job - [Activity]
  • Handling MapReduce Job
  • Exploring Hadoop Datatypes
  • Understanding Data Locality
  • Serialization & DeSerialization

View Module Presentation


Module 04 - Hive and Hive QL +
Learning Objectives -  In this module you will learn Hive and its similarity with SQL,Understanding Hive concepts, Hive Data types, Loading and Querying Data in Hive.

Topics -
  • A Walkthrough of Hive Architecture
  • Understanding Hive Query Patterns
  • Internal vs External tables
  • Different ways to describe Hive tables
  • [Use case] - Disscussing where to use which types of table.
  • Different ways to load data into Hive tables - [Activity]
    1. Loading data from Local File System to hive Tables.
    2. Loading data from HDFS to Hive Tables.
  • Exploring Hive Complex Data types. - [Hands-on]
    1. Arrays
    2. Maps
    3. Structs
  • Exploring Hive built-in Functions.

For more assignments check E-Learning


Module 05 - Hive Optimization +
Learning Objectives - In this module, you will understand Advanced Hive concepts such as Partitioning, Bucketing, Dynamic Partitioning, different Storage formats etc.

Topics -
  • Understanding Hive Complex Data types
    1. Arrays,
    2. Map
    3. Struct
  • Partitioning
  • [Use case] - Using Telecom dataset and learn which fields to use for Partitioning.
  • Dynamic Partitioning
  • [Use case] - Using IOT dataset and learn Dynamic Partitioning.
  • Hive Bucketing
  • Bucketing VS Partitioning
  • Dynamic Partitioning with Bucketing
  • Exploring different Input Formats in Hive
    1. TextFile Format - [Activity]
    2. SequenceFile Format - [Activity]
    3. RC File Format - [Activity]
    4. ORC Files in Hive - [Activity]
  • Using different file formats and capturing Performance reports - [POC]
  • Map-side join - [Hands-on]
  • Reduce-side join - [Hands-on]
  • [Use case] - Looking different problems to which Map-side and Reduce-side join can be used.
  • Map-side join VS Reduce-side join - [Hands-on]
  • Writing custom UDF - [Hands-on]
  • Accessing Hive with JDBC - [Hands-on]
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 06 - Sqoop  +
Learning Objectives - In this module you will learn how to Import and export data from traditional databases, like SQL, Oracle to Hadoop using Sqoop to perform various operations.

Topics -
  • Sqoop Overview
  • Sqoop JDBC Driver and Connectors
  • Sqoop Importing Data
  • Various Options to Import Data
  • Understanding Sqoop Jobs
    1. Table Import
    2. Filtering Import
  • Incremental Imports using Sqoop


Module 07 - Apache Pig  +
Learning Objectives - In this module you will learn Apache Pig by contrasting it with MapReduce..

Topics -
  • Introduction to Apache Pig
  • MapReduce VS Pig
  • Exploring Pig Components and Pig Execution
  • Introduction to Pig Latin
  • Input and Output
    1. Load
    2. Store
    3. Dump
  • Relational Operators
    1. Foreach
    2. Filter
    3. Group
    4. Distinct
    5. Join
    6. Parallel
  • Multi Dataset Operators
    1. Techniques for combining Data sets
    2. Joining Data sets in Pig
Understanding & Building Data pipeline Architecture using Pig and Hive +

Project Description - We will use the the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) data who tracks the ontime performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights appear in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.


Architect diagram - 


Module 08 - NoSQL & HBase    +
Learning Objectives - In this module you will learn about NoSql database and difference between HBase and relational databases. Explore features of the NoSQL databases, CAP theorem, and the HBase architecture. Understand the data model and perform various operations

Topics -
  • Understanding NoSQL Databases
  • Categories of NOSQL
    1. Key-Value Database
    2. Document Database
    3. Column Family Database
    4. Graph Database
  • What is HBase
  • Row Oriented VS Column Oriented Database
  • Features of HBas
  • Data Model in HBase
  • HBase Physical Storage
  • Exploring HBase Shell Commands
    1. PUT
    2. GET
    3. DELETE
    4. Filtering Records
  • HBase Client API


Quartz - Enterprise Job Scheduler


Module - Quartz Scheduler  +
Learning Objectives - In this module you will understand about quartz job scheduler

Topics -
  • What is Job Scheduling Framework
  • Role of Scheduling Framework in Hadoop
  • What is Quartz Job Scheduling Library
  • Using Quartz
  • Exploring Quartz API
    1. Jobs
    2. Triggers
  • Scheduling Hive Jobs using Quartz scheduler

How will I execute the Practicals?

We will help you to setup NPN Training's Virtual Machine in your System with local access. The detailed installation guides are provided in the E-Learning for setting up the environment.

Is Java a pre-requisite to learn Big Data and Hadoop?

Yes, you definitely can. We will provide you the Video Tutorial for Java. You can start immediately and before the Java is introduced in the Hadoop course from the third week (Map-Reduce), you would have enough time already to clear your concepts in Java.

NPN Training Certification Process:

At the end of your course, you will work on a real time Project. You will receive a Problem Statement along with a data-set to work. Once you are successfully through the project (Reviewed by an expert), you will be awarded a certificate with a performance-based grading. If your project is not approved in 1st attempt, you can take extra assistance for any of your doubts to understand the concepts better and reattempt the Project free of cost.

Yalaguresh Jorapur
Company: Infosys

NPN Training is definetely one of the best training institute in Bangalore for Hadoop. The course content is elaborate. The few things that NPN Training apart from others are the live scenarios, case studies and workshops from experts in the industry. I would recommend any one looking for deep dive into hadoop to start with NPN Training.


I enjoy being in Naveen\'s class and having as my teacher I have learnt a lot in the class. One thing that comes into my mind is Map Reduce and Hive. I have really got familiar with them. I really appreciate Naveen\'s efforts in making us understand things and repeat them as many times. You are into a noble job. I wish you a great success in fullfillment of the responsibilities towards the job. Thank you for being my teacherHive. I have really got familiar with them. I really appreciate Naveen\\\'s efforts in making us understand things and repeat them as many times. You are into a noble job. I wish you a great success in fullfillment of the responsibilities towards the job. Thank you for being my teacher

Lakshman Singh
Company: Cibersites India Pvt Ltd

I came to know about NPN Training from one of my friend. When I attended demo classes, Naveen sir gave clear Top level vision on Big Data and Hadoop Framework. During demo classes Naveen sir explained about different components of Hadoop framework and how they will applied in industry.
As class progress an subsequent weeks, Naveen sir delivered and provided Hands-on experience on each topic, this face me enough confidence on the topic. Along with Class Room Training, Naveen sir started a project for our batch. Every individual will be assigned a specific task from the project. Where we applied the learned concept from class room training. After completion of few weeks of training I got enough confidence on Hadoop and Big Data concepts.
One more beauty of NPN Training is Naveen sir will give complete attention on every class member, he will ensure that every one will be on the same page during class room sessions. The NPN Training portal is very useful for revision, w here materials will be uploaded and can be used. At last but not the least, I have taken correct decision by joining NPN Training, this institute provides a very good training. Thanks a lot Naveen sir

Contact us

+91 8095918383 | +91 9535584691

Upcoming batches



Big Data & Hadoop

(Weekend Saturday batch)
Fees 15,000 INR



Big Data & Hadoop

(Weekend Saturday batch)
Fees 15,000 INR

Course Features

Big Data Architect Masters Program Training
4.8 stars - based on 150 reviews