Big Data Hadoop Training with Certification

Learn to store, manage, retrieve and analyze Big Data on clusters of servers using the Hadoop eco-system Become one of the most in-demand IT professional in the world today Don't just learn Hadoop development also learn Hadoop Testing and how to analyze large amounts of data to bring out insights Relevant examples and cases make the learning more effective and easier Gain hands-on knowledge through the problem solving based approach of the course along with working on a project at the end of the course.

OFFER Get Hadoop Administration course absolutely free. Enroll Now


  • About the courses
  • Curriculum
  • FAQ's
  • Certification
  • Review

About the Course

The Big Data and Hadoop training course from NPN Training is designed to enhance your knowledge and skills to become a successful Hadoop Developer,Hadoop Tester & Hadoop Analyst.

 

At NPN Training we believe in the philosophy "Learn by doing" hence we provide complete

Hands-on training with a real time project development.

 

Course Objectives

By the end of the course,  you will: 

1.     Understand Hadoop 1.x & 2.x Architecture.
2.     Setup Hadoop Cluster and write Complex MapReduce programs. 
3.     Learn different Hadoop Commands.
4.     Data Loading techniques using Sqoop.
5.     Perform data analytics using Pig, Hive and YARN .
6.     Understand NoSQL & HBase.
7.     Implement best practices for Hadoop development. 

 

Work on a real life project on Big Data Analytics

As part of the course work, you will work on the below mentioned projects,where you will be using PIG, HIVE, HBase  and MapReduce to perform Big Data analytics.
Following are a few industry-specific Big Data case studies that are included in our Big Data and Hadoop Certification
e.g. Security Agency, Retail, Banking, Education, Media, Health care etc.

 

Project #1 : Analysis of Afghan War Dairies

Industry : Security Agency

The data comprises information gathered by soldiers and Intelligent officers of United States Military to examine events that involve explosive hazards and to find events that involve Improvised Explosive Devices (IEDs).

 

Project #2 : Customer Complaints Analysis about Products

Industry : Retail

Publicly available dataset, containing a few lakh observations with attributes like; CustomerId, Payment Mode, Product Details, Complaint, Location, Status of the complaint, etc. 
Problem Statement: Analyze the data in the Hadoop ecosystem to:
1. Get the number of complaints filed under each product
2. Get the total number of complaints filed from a particular location
3. Get the list of complaints grouped by location which has no timely response

 

Project #3 : Credit card Analysis

Industry : Banking

XYZ Bank is an Indian multinational banking and financial services company headquartered in Delhi, India. XYZ is a financial institution that provides various financial services, such as accepting deposits, issuing Credit Cards and loans. XYZ bank has range of investment products that offer like savings accounts and certificates of deposit. It offers a wide range of banking products and financial services for corporate and retail customers through a variety of delivery channels and specialised subsidiaries in the areas of investment banking, life, non-life insurance, venture capital and asset management.

 

Project #4 : Scholastic Assessment Analysis

Industry : Education

This data set is SAT (College Board) 2010 School Level Results which gives you the information about how the students perform in the tests from different schools. It consists of the below fields.
DBN, School Name, Number of Test Takers, Critical Reading Mean, Mathematics Mean, Writing Mean
Here DBN will be the unique field for this dataset. The students were given a test. Based on the results from the test.

Here we are trying to analyze this data and below are the few problem statements that we have chosen:
1. Find the total number of test takers.
2. Find the highest mean/average of the Critical Reading section and the school name.
3. Find the highest mean/average of the Mathematics section and the school name
4. Find the highest mean/average of the Writing section and the school name

 

Project #5 : Processing Movielens dataset using Pig

Industry : Entertainment

In this project, we will learn about Apache Pig and how to use it to process the Movielens dataset. We will get familiar with the various Pig operators used for data processing. We will cover how to use UDFs and write your own custom UDFs. Finally we will take a look at diagnostics and performance tunning.


Project #7 : Health care Analysis

Industry : Health care

Below are few of the problem statement that we have chosen to work on this dataset.
1.How many hospital centres got more than 60% patient satisfaction regarding cleanliness?
2.Which hospital centre got maximum overall rating between 9-10?

.

 

 

 
Module 01 - Understanding Big Data and Hadoop 1.x, 2.x +
Learning Objectives - In this module, you will understand What is Big Data, the challenges of Big Data, Exploring Hadoop 1.x & 2.x Architecture, YARN,

Topics -
  • Understanding what is Big Data
  • Factors Constituting Big Data
  • Bussiness Usecase - Telecom
  • Challenges of Big Data
  • OLTP VS OLAP Applications
  • Limitations of existing Data Analytics
  • A combined storage compute layer
  • Scaling Up Vs. Scaling Out
  • Different modes in Hadoop
  • Introduction to Hadoop
  • Moving code to data VS Moving data to code
  • Hadoop Core Componentss
  • HDFS VS GFS comparision
  • Exploring different daemons in Hadoop 1.x Architecture (NameNode,JobTracker,SecondaryNameNode,DataNode,TaskTracker)
  • Understanding Limitation of Hadoop 1.x Architecture
  • Exploring different daemons in Hadoop 2.x Arcchitecture
  • (NameNode,ResourceManager,SecondaryNameNode,DataNode,NodeManager)
  • HDFS Federation
  • High Availability
  • YARN (Yet Another Resource Negotiator)
  • Understanding NameNode metadata
  • File Blocks in HDFS
  • Rack Awareness
  • Anatomy of File Read From HDFS
  • Hadoop Eco-System
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 02 -  Exploring Hadoop Installation and Configuration +
Learning Objectives - In this module, you will learn the how to install Hadoop and important configurations in Hadoop.

Topics -
  • Different modes in Hadoop
  • Installing Hadoop in Pseudo Distributed Mode - [Activity]
  • Understanding important configuration files, their properties and daemon threads.
  • Installation of VM & creating Hadoop Environment in VM - [Activity]
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 03 - Exploring Hadoop Commands  [Hadoop Administration] +
Learning Objectives - In this module, you will learn Formatting NameNode, HDFS File System Commands, MapReduce Commands, Different Data Loading Techniques,Cluster Maintence etc.

Topics -
  • Analyzing ResourceManager and NameNode UI
  • Exploring HDFS File System Commands - [Hands-on]
  • Exploring Hadoop Admin Commands - [Hands-on]
  • Printing Hadoop Distributed File System
  • Running Map Reduce Program - [Hands-on]
  • Killing Job
  • Data Loading in Hadoop - [Hands-on]
  •     i.     Copying Files from DFS to Unix File System
  •     ii.    Copying Files from Unix File System to DFS
  •     iii.   Understanding Parallel copying of data to HDFS - [Hands-on]
  • Executing MapReduce Jobs
  • Different techniques to move data to HDFS - [Hands-on]
  • Backup and Recovery of Hadoop cluster - [Activity]
  • Commissioning and Decommissioning a node in Hadoop cluster. - [Activity]
  • Understanding Hadoop Safe Mode - Maintenance state of NameNodeKey/value pairs - [Hands-on]
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 04 - MapReduce Programming - I  [Hadoop Development] +
Learning Objectives -  In this module, you will understand how MapReduce framework works.

Topics -
  • Key/value pairs
  •     What it mean
  •     Why key/value data?
  • Topology Hadoop Cluster
  •     The 0.10 MapReduce Java API
  •     The Reducer class
  •     The Mapper class
  •     The Driver class
  • [Use Case] - Word Count Program
  • Default InputFormat - TextInputFormat
  • Submission & Initializing of MapReduce job
  • Handling MapReduce Job
  • Hadoop Datatypes
  • Data Locality
  • Serialization & DeSerialization
  • Serialization Classes in Hadoop
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 05 - Hive & Hive QL [Hadoop Development, Testing] +
Learning Objectives - In this module you will learn Hive and its similarity with SQL,Understanding Hive concepts, Hive Data types, Loading and Querying Data in Hive..

Topics -
  • Need for Hive
  • Hive Architecture
  • A Walkthrough of Hive Components
  • Hive Query Flow
  • Schema design for a Data warehouse
  • Hive Metastore
  •      - Embedded Metastore
  •      - Local Metastore
  •      - Remote Metastore
  • Exploring Hive Command Line Interface
  • Hive Query Patterns
  • Understanding Internal VS External tables
  • [Use Case] - Disscussing where to use which types of table
  • Different ways to load data into Hive tables - [Hands-on]
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 06 - Hive Optimization [Hadoop Development, Testing] +
Learning Objectives - In this module, you will understand Advanced Hive concepts such as Partitioning, Bucketing, Dynamic Partitioning, different Storage formats etc.

Topics -
  • Hive Complex Data types
  • Partitioning
  • [Use case] - Using Telecom dataset and learn which fields to use for Partitioning.
  • Dynamic Partitioning
  • [Use case] - Using IOT dataset and learn Dynamic Partitioning.
  • Dynamic Partitioning with Bucketing
  • Bucketing VS Partitioning
  • Exploring different Input Formats in Hive
  •     TextFile Format - [Activity]
  •     SequenceFile Format - [Activity]
  •     RC File Format - [Activity]
  •     ORC Files in Hive - [Activity]
  • [Use case] - Using different file formats and capturing Performance reports
  • Map-side join - [Hands-on]
  • Reduce-side join - [Hands-on]
  • [Use case] - Looking different problems to which Map-side and Reduce-side join can be used.
  • Map-side join VS Reduce-side join - [Hands-on]
  • Writing custom UDF - [Hands-on]
  • Accessing Hive with JDBC - [Hands-on]
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 07 - MapReduce Programming - II +
Learning Objectives - In this module you will learn advance concepts in MapReduce programming and we will explore different optimization techniques available in MapReduce.

Topics -
  • Flavors of MapReduce API
  • Flow of Operations in MapReduce
  • Understanding Chaining in MapReduce Job
  • Use cases using MapReduce Chaining
  • Understanding Zero Reducers use cases
  • Optimizing MapReducer programs using Combiners
  • Writing Custom Combiners
  • Understanding Partitioners In Hadoop
  • Writing Custom Partitioners
  • Exploring built in Counters
  • Writing Custom Counters
  • [Use case] - Industry Use case where Counters used.
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 08 - NoSQL & HBase  [Hadoop Testing] +
Learning Objectives - In this module you will learn about NoSql database and difference between HBase and relational databases. Explore features of the NoSQL databases, CAP theorem, and the HBase architecture. Understand the data model and perform various operations.

Topics -
  • Hive Data Model
  • Categories of NOSQL
  •     Key-Value Database
  •     Document Database
  •     Column Family Database
  •     Graph Database
  • Aggregation Oriented fit for NOSQL
  • NOSQL Implementation
  • Key-Value Database Example and Use
  • Document Database Example and Use
  • Column Family Database Example and use
  • Document Database Example and Use
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 09 - Advance HBase using Java [Hadoop Development] +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -
  • What is HBase
  • Row Oriented VS Column Oriented Database
  • Features of HBase
  • When to use HBase
  • When no to us HBase
  • Data Model in HBase
  • HBase Physical Storage
  • Versions and HBase Operations
  •     Get
  •     Put
  •     Delete
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 10 - MapReduce Programming - III +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -
  • Exploring ProgramDriver class in Hadoop API
  • Understanding Distributed Cache / Map side join
  • [Use case] - Distributed Cache / Map side join
  • Understanding Reduce side Join
  • [Use case] - Looking different idustry Use cases for Reduce side join
  • Passing Configurations to MapReduce Programs
  • Demo : Static Configuration
  • Exploring Tool interface in Hadoop API
  • Small File Problem in Hadoop
  • Solving Small File Problem Distributed Cache / Map side join - [Hands-on]
  • Exploring SequenceFileFormat
  • Demo: SequenceFilesClass in Scala
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 11 - MR Unit +
Learning Objectives - In this module you will learn various strategies to test and validate the map reduce jobs for Hadoop.We will also focus on various ways to do unit testing for map reduce jobs.

Topics - For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 12 - Pig and Pig Latin +
Learning Objectives - In this module you will learn Apache Pig by contrasting it with MapReduce. Different types of use case we can use Pig etc.

Topics -
  • What is Pig
  •     Pig on Hadoop
  •     Pig Latin
  • Installing & Running Pig
  • Grunt
  •     Entering Pig Latin scripts in Grunt
  •     HDFS Command in Grunt
  •     Controlling Pig from Grunt
  • Grunt
  •     Get
  •     Put
  •     Delete
  • Introduction to Pig Latin
  •     Input and Output
  •         Load
  •         Store
  •         Dump
  •     Relational Operators
  •         foreach
  •         Filter
  •         Group
  •         Distinct
  •         Join
  •         Sample
  •         Parallel
  • Multi-Dataset Operations with Pig
  •     Techniques for combining Data sets
  •     Joining Data sets in Pig
  •     Splitting Data setsRDD Caching and Persistance
For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 13 - Schedulers in YARN +
Learning Objectives -In this module, you will learn different schedulers available in Hadoop 2.x Architecture.

Topics - For more Assignments + Use cases + Project work + Materials can be found in E-Learning
Module 14 - Sqoop +
Learning Objectives - In this module, you will learn different schedulers available in Hadoop 2.x Architecture.

Topics -
  • Sqoop Overview
  • How does Sqoop work
  • Sqoop JDBC Driver and Connectors
  • Sqoop Importing Data
  • Various Options to Import Data
  •     Table Import
  •     Binary Data Import
  •     SpeedUp the Import
  •     Filtering Import
  •     Full Database Import
For more Assignments + Use cases + Project work + Materials can be found in E-Learning

How will I execute the Practicals?

We will help you to setup NPN Training's Virtual Machine in your System with local access. The detailed installation guides are provided in the E-Learning for setting up the environment.


Is Java a pre-requisite to learn Big Data and Hadoop?

Yes, you definitely can. We will provide you the Video Tutorial for Java. You can start immediately and before the Java is introduced in the Hadoop course from the third week (Map-Reduce), you would have enough time already to clear your concepts in Java.

NPN Training Certification Process:

At the end of your course, you will work on a real time Project. You will receive a Problem Statement along with a data-set to work. Once you are successfully through the project (Reviewed by an expert), you will be awarded a certificate with a performance-based grading. If your project is not approved in 1st attempt, you can take extra assistance for any of your doubts to understand the concepts better and reattempt the Project free of cost.

Yalaguresh Jorapur
Company: Infosys
Facebook



NPN Training is definetely one of the best training institute in Bangalore for Hadoop. The course content is elaborate. The few things that NPN Training apart from others are the live scenarios, case studies and workshops from experts in the industry. I would recommend any one looking for deep dive into hadoop to start with NPN Training.

Vinay
Company:



I enjoy being in Naveen\'s class and having as my teacher I have learnt a lot in the class. One thing that comes into my mind is Map Reduce and Hive. I have really got familiar with them. I really appreciate Naveen\'s efforts in making us understand things and repeat them as many times. You are into a noble job. I wish you a great success in fullfillment of the responsibilities towards the job. Thank you for being my teacherHive. I have really got familiar with them. I really appreciate Naveen\\\'s efforts in making us understand things and repeat them as many times. You are into a noble job. I wish you a great success in fullfillment of the responsibilities towards the job. Thank you for being my teacher

Lakshman Singh
Company: Cibersites India Pvt Ltd
Facebook



I came to know about NPN Training from one of my friend. When I attended demo classes, Naveen sir gave clear Top level vision on Big Data and Hadoop Framework. During demo classes Naveen sir explained about different components of Hadoop framework and how they will applied in industry.
As class progress an subsequent weeks, Naveen sir delivered and provided Hands-on experience on each topic, this face me enough confidence on the topic. Along with Class Room Training, Naveen sir started a project for our batch. Every individual will be assigned a specific task from the project. Where we applied the learned concept from class room training. After completion of few weeks of training I got enough confidence on Hadoop and Big Data concepts.
One more beauty of NPN Training is Naveen sir will give complete attention on every class member, he will ensure that every one will be on the same page during class room sessions. The NPN Training portal is very useful for revision, w here materials will be uploaded and can be used. At last but not the least, I have taken correct decision by joining NPN Training, this institute provides a very good training. Thanks a lot Naveen sir

Contact us


+91 8095918383 | +91 9535584691

Upcoming batches

Dec

02

Big Data & Hadoop

Timings
- (Weekend Saturday batch)
Fees 12,000 INR

Dec

16

Big Data & Hadoop

Timings
(Weekend Saturday batch)
Fees 12,000 INR

Jan

13

Big Data & Hadoop

Timings
(Weekend Saturday batch)
Fees 12,000 INR

Course Features

Big Data Architect Masters Program Training
4.8 stars - based on 150 reviews