Big Data Hadoop Training with Certification

With NPN Training’s Apache Spark and Scala certification training you would advance your expertise in Big Data Hadoop Ecosystem. With this Apache Spark Scala certification you will master the essential skills such as Scala Programming, Spark Streaming, Spark SQL, Machine Learning Programming, GraphX Programming, Shell Scripting Spark.


  • About the courses
  • Curriculum
  • FAQ's
  • Certification
  • Review

About the Course

Be the expert in Big Data processing by learning the conceptual implementation of Apache Storm and Apache Spark using Scala Programming

This course includes Apache Storm, Spark & Scala that is designed keeping in mind the industry requirements for high-speed processing of data.

 

At NPN Training we believe in the philosophy "Learn by doing" hence we provide complete

Hands-on training with a real time project development.

 

Course Objectives

After completing the Apache Spark & Scala course, you will be able to:

1.   Understanding Apache Spark

2.   Understanding Scala & its implementation

3.   Understand Functional Programming in Scala

4.   Understand Control Structures, Loops, Collection and more using Scala

5.   Master the concepts of Traits andd OOPs in Scala

6.   Comparision between Spark and Scala

7.   Install Spark and implement Spark operations on Spark Shell

8.   Understand the role of RDD

9.   Implement Spark applications on YARN (Hadoop)

10. Streaming data using Spark Streaming API

11. Implement Maching Learning algorithms in Spark using MLib API

12. Analyze Hive and Spark SQL Architecture

13. Implement Spark SQL queries to perform several computations

14. Understand GraphX API and implement graph algorithms.

15. Implement Broadcast variable and accumulators for performance tuning.

 

As part of the course work, you will work on the below mentioned projects,

 

Project #1 : Smart Data Generator

Industry : General

Creating a project which generate dynamic mock data based on the schema at real time  which can be further used for Real time Processing Streaming using Apache Storm or Spark Streaming.

 

Project #2 : Analysis of Call Detail Record (CDR)

Industry : Telecom

You will be given a CDR (Call Detail Record) which is a data record produced by a telephone exchange or other telecommunications equipment that documents the details of a telephone call or other telecommunications transaction (e.g., text message) that passes through that facility or device.

 

Real Time Analytics using Apache Storm

Module 01 - Storm Technology Stack +
Learning Objectives - In this module, you will learn and understanding Apache Storm and concepts like Spout, Bolt, Topology, Tuples, and do real-time stream processing. 

Topics -

Complex Event Processing

Introduction to Apache Storm

Batch vs Real time processing

Use cases of Apache Storm

Storm Architecture

Storm Processes

    i.    Nimbus Process

    ii.   Supervisor Process

    iii.  Worker Process

Storm Data Model

Components of Storm Cluster

Understanding Storm Topology

    i.    Spout

    ii.   Bolt

 

For more assignments check E-Learning

 

Module 02 - Spouts and Bolts +
Learning Objectives - In this module, you will learn and understanding Apache Storm and concepts like Spout, Bolt, Topology, Tuples, and do real-time stream processing. 

Topics -

Storm Operation Modes

Maven project to create First Topology

Simple 'Hello World' Topology

    i.    Implementing Spout

    ii.   Implementing Bolt

    iii.  Submitting the Topology

[Use case] Network error Analysis using Apache Storm

Reading data from a File

Repersenting Data using Tuples

Accessing Data from Tuples

Writing data to a File

LocalCluster VS StormSubmitter

Depoloying Storm Topology in Cluster

Storm Services

Persist data into HDFS using HDFS Bolt

For more assignments check E-Learning

 

Module 03 - Adding Parallelism to a Storm Topology +
Learning Objectives - In this module, you will learn different grouping mechanism present in Apache Storm.

Topics -

Understanding Stream Grouping

Types of Stream Grouping

    i.     Shuffle Grouping

    ii.    Fields Grouping

    iii.   Partial Key Grouping

    iv.   All Grouping

    v.    Global Grouping

Implementing Shuffle Grouping [Hands-on]

Implementing Fields Grouping [Hands-on]

Implementing All Grouping [Hands-on]

Implementing Direct Grouping [Hands-on]

Implementing Global Grouping [Hands-on]

Writing our own Custom Grouping [Hands-on]

For more assignments check E-Learning

 

Apache Spark & Scala

Module 03 - Introduction to Scala for Apache Spark +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

introduction to Scala REPL

Basic Scala operations

Exploring different Variable Types

    i.   Mutable Variables - [Hands-on]

    ii.  Immutable Variables - [Hands-on]

Type Inference in Scala - [Hands-on]

Block Expressions

Exploring Lazy evaluation in Scala

Control Structures in Scala

Exploring different variants of for loop

    i.    Enhanced for loop. - [Hands-on]

    ii.   For loop with yield. - [Hands-on]

    iii.   For Loop with if conditions : Pattern Guards - [Hands-on]

Match Expressions - [Hands-on]

Exploring Functions in Scala

Exploring Procedures in Scala

Collections in Scala

    i.    Array

    ii.   ArrayBuffer

    iii.   Map

    iv.   Tuples

     v.   Lists

For more assignments check E-Learning
Module 04 - OOPs and Functional Programming in Scala +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

Class in Scala

Getters and Setters

Custom Getters and Setters

Properties with only Getters

Auxiliary Constructor

Primary Constructor

Singletons

Companion Objects

Extending a Class

Overriding Methods

Traits as Interfaces

Layered Traits

Functional Programming

Higher Order Functions

Anonymous Functions

For more assignments check E-Learning
 
Module 05 -  Overview of Apache Spark+
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

Overview of Apache Spark

Features of Apache Spark

Exploring Data sharing in MapReduce

Exploring Data sharing in Apache Spark

Spark Eco system

Introduction to RDD

Exploring Properties of RDD

    i.    Immutable

    ii.   Lazy evaluated

    iii.  Cacheable

    iv.  Type Inferred

Understanding Partitions in Spark

Characteristics of Partitions in Spark

Spark Architecture

Spark Modes

Hadoop VS Spark

Eco system of Hadoop VS Spark 

 
 
Module 06 - Spark Common Operations +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

RDD Creations

    i.  Loading file using SparkContext

    ii. Converting Collection

RDD Operations

    i.   RDD Transformations

    ii.   RDD Actions

RDD's in action : Simple word count application

Spark Internals

For more assignments check E-Learning
Module 07 - Playing with RDD's +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

RDD Caching and Persistance

reduce() vs fold()

Scala RDD Extensions

    i.    DoubleRDDFunctions

    ii.   PairRDDFunctions

    iii.   OrderedRDDFunctions

    iv.  SequenceFileRDDFunctions

Exploring Aggregate Functions

groupByKey function

reduceByKey function

For more assignments check E-Learning
Module 08 - DataFrames and Spark SQL +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

Limitaion of MapReduce

Introduction to Apache Spark

Data sharing in MapReduce and Apache Spark

Introduction to RDD

RDD Traits

The List Interface

For more assignments check E-Learning

 

NoSQL - MongoDB

Module 09 - NoSQL using MongoDB +
Learning Objectives - In this module, you will learn connecting application with Oracle Database.

Topics -

What is MongoDB

Download and Install MongoDB on Windows

MongoDB Create & Insert Database

Add MongoDB Array using insert()

Mongodb ObjectId()

MongoDB Query Document using find()

MongoDB cursor

MongoDB Query Modifications using limit(), sort()

MongoDB Count() & remove() function

MongoDB Update() Document

MongoDB Indexing Tutorial - createIndex()

Module Presentation

 

For more Assignments + Use cases + Project work + Materials check E-Learning

Contact us


+91-9535584691 | +91-8095918383

Upcoming batches

Dec

02

Apache Spark & Scala

Timings
- (Weekend Saturday batch)
Fees 18,000 INR

Jan

06

Apache Spark & Scala

Timings
- (Weekend Saturday batch)
Fees 18,000 INR

Jan

27

Apache Spark & Scala

Timings
- (Weekend Saturday batch)
Fees 18,000 INR

Course Features

Big Data Architect Masters Program Training
4.8 stars - based on 150 reviews