Big Data Architect Masters Program

Building Data Pipelines using Cloud

The Big Data Architect Masters Program is designed for professionals who are seeking to deepen their knowledge in the field of Big Data. The program is customized based on current industry standards and designed to help you gain end to end coverage of Big Data technologies. The program is curated by industry experts to provide hands-on training with tools that are used widely in industries across Big Data domain.

UPCOMING BATCHES

Connect With Us

Program Highlight

About the Program

The Big Data Architect Masters Program is designed to empower professionals to develop relevant competencies and accelerate their career progression in Big Data technologies through complete Hands-on training.

Being a Big Data Developer requires you to learn multiple technologies, and this program will ensure you to become an industry-ready Big Data Architect who can provide solutions to Big Data projects.

At NPN Training we believe in the philosophy “Learn by doing” hence we provide complete Hands-on training with a real time project development.

Program Structure

Most Meticulously designed Data Engineering program

Course Description

This course will help you understand how Hadoop, Spark and its eco-system solves storage and processing of large data sets in a distributed environment.

Distributed Storage & Batch Processing
using Hadoop and Spark

1 Capstone Project

Know More
Course Description

In this course, you will learn one of the most popular enterprise messaging and streaming platform, you will learn the basics of creating an event-driven system using Apache Kafka and Spark Structured Streaming and the ecosystem around it.

Event Driven Architecture and Streaming using
Kafka and Structured Streaming

1 Capstone Project

Know More
Course Description

Amazon Web Services (AWS) offer a unique opportunity to build out scalable, robust, and highly-available systems in the cloud. This course will give you an overview of all of the different services that you can leverage within AWS to build out a Big Data solution.

Data Engineering with AWS Data Analytics Services

1 Capstone Project

Know More
Course Description

In this course, you will learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.

Data Engineering with Microsoft
Azure Databricks

1 Capstone Project

Know More
Course Description

This course will provide you to additional tools and framework to become a successful data engineer.

Data Engineering Essentials Add-on Modules

1 Capstone Project

Know More

Comprehensive Curriculum

Tried & tested curriculum to make you a solid Data Engineer

Distributed Storage and Batch Processing
Event Driven Architecture & Streaming
Data Engineering using AWS Data Analytics
Data Engineering with Azure Databricks
Data Engineering Essentials Add-on Modules

Course description: This course will help you to learn one of the most powerful, In memory cluster computing framework.

Data Engineering Concepts and Hadoop 2.x (YARN)

Learning Objectives – This module introduces you to the core concepts, processes, and tools you need to know in order to get a foundational knowledge of data engineering. You will gain an understanding of the modern data ecosystem and the role Data Engineers, Data Scientists, and Data Analysts play in this ecosystem.

Topics –

Introduction to Big Data (Classification and Characteristics)
Drawbacks of RDBMS
Challenges of Big Data
Sources of Big Data
Distributed Systems
What is Data Engineering, Roles and Responsibilities
Data Pipelines and types
Hadoop 2.x and Core Components
Hadoop Daemons
Hadoop Architecture
HDFS File Blocks and Architecture
Introduction to YARN and Architecture
YARN Application Execution Flow

Hadoop Commands and Configurations

Learning Objectives – In this module, you will learn different commands to work with HDFS File System, YARN Commands and how to execute and monitor jobs.
Topics –

Exploring Important Configuration Files
Exploring HDFS File System Commands
Exploring Hadoop Admin Commands
Exploring YARN Commands
- Executing YARN Jobs
- Monitoring YARN Jobs
- Monitoring different -appTypes
- Killing YARN Jobs
Exploring Name Node and Resource Manager UI

Structured Data Analysis using Hive

Learning Objectives – In this module, you will understand Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts and Hive UDF.
Topics –

Introduction to Hive and Architecture
Exploring Hive table types
Data loading techniques in Hive
Hive Complex Data types (Array, Map,
tructType)

Getting Started with Apache Spark + RDD’s

Learning Objectives – In this module, you will learn about spark architecture in comparison with Hadoop Ecosystem and you will learn one of the fundamental building blocks of Spark – RDDs and related manipulations for implementing business logic (Transformations, Actions and Functions performed on RDD).
Topics –

Overview of Apache Spark
Data sharing in MapReduce vs Spark
Exploring Spark Ecosystem
Exploring RDD’s : Basic Building Block
Partitions
Starting Spark Shell
RDD Creations
1. Loading a file
2. Parallelize Collections
Exploring RDD Operations
1. Transformations
2. Actions
RDD Actions
1. count()
2. first()
3. take(int)
4. saveAsTextFile(path:String)
5. reduce(func)
6. collect(func)
RDD Transformations
1. map(func)
2. foreach(func)
3. filter(func)
Chaining Transformation and Actions in Spark
Configuring Development environment
Initializing SparkSession i.e Spark 2.x entry point
Pair RDD
Sorting Grouping and Aggregations

Exploring Spark SQL and DataFrame API

Learning Objectives – In this module, you will learn about Spark SQL which is used to process structured data with SQL queries. You will learn about DataFrames and Datasets in Spark SQL and perform SQL operations on DataFrames.

Topics –

Introduction to Spark SQL
Overview of DataFrames
Creating DataFrames (In-Memory + External Source)
Exploring DataFrameReader API
Attaching Custom Schema
Working with Columns
Filtering
Adding, Renaming, Dropping column
Grouping, Sorting and Aggregations
Registering DataFrame as a Table
Join Operations
Understanding DataFrameWriter API

Deep Dive Dive DataFrame API

Learning Objectives – In this module, you will learn some of the advance concepts of DataFrame API

Topics –

Complex Data Types
Handling Corrupt and Missing Records
Working with Dates
User Defined Functions (UDF)
Connect to DB via DataFrame
Window Functions

Packaging, Deploying and Debugging

Learning Objectives –In this module, you will learn what are the different aspects to take care to deploy and improve Spark applications.

Topics –

Packaging Spark Application
Submitting Spark Application using spark-submit command
Deployment Modes
Configuring Spark on YARN
Monitoring Spark applications on YARN

Best Practices and Performance Tuning

Learning Objectives –In this module, you will learn what are the different aspects to take care to deploy and improve Spark applications.

Topics –

Caching and Persistence
Broadcast Variables
Optimizing Spark Joins
Memory Partitioning

Course description: Apache Kafka is a popular tool used in many big data analytics projects to get data from other systems into big data system. Through this course students can develop Apache Kafka applications that send and receive data from Kafka clusters. By the end of this course, you will be able to set up a personal Kafka development environment, master the concepts of topics, partitions and consumer groups, develop a Kafka producer to send messages and develop a Kafka consumer to receive messages. You will also practice the Kafka command line interfaces for producing and consuming messages.

Module 01 – Getting Started with Kafka and Core API’s

Learning Objectives – In this module, you will understand Kafka and Kafka Architecture.
Topics –

Integration between components
What is Kafka
Components of Messaging System
Understanding Kafka components in detail
1. Producer
2. Consumer
3. Broker
4. Cluster
5. Topic
6. Partitions
7. Offset
8. Consumer groups
Message Retention in kafka
Kafka Commit Log
Kafka
1. Starting Zookeeper
2. Starting Kafka Server
3. Topic operations: create, list, delete, describe
4. Publishing data to a topic using console producer
5. Publishing data to a topic using console consumer
6. Sending and receiving messages
Hands on – Kafka Cluster with Multiple Brokers
1. Creating separate configuration files for brokers
2. Launching multiple brokers
3. Getting cluster information and broker details from Zookeeper
Hands on – Topic with multiple partitions
1. Creating topic with multiple partitions
2. How messages are spread across partitions
3. Reading messages from specific partitions
4. Reading messages from specific offset in specific partition
Understanding Kafka Core API’s
Implementing Kafka Producer & Consumer

Deep Dive Kafka Producer and Consumer API

Learning Objectives – In this module, you will understand you will learn advance Kafka Core API i.e Producer API, Consumer API, Kafka Connect
Topics –

Understanding Producer Partitioning Mechanism using Java
Different ways to implement partitioning mechanism
- Providing partition number
- Using Round Robin
- Key Hashing
Messaging Sending
Producer API
- Synchronous Send
- ASynchronous Send

Streaming Data with Spark Structured Streaming

Learning Objectives – In this module, you will understand you will learn advance Kafka Core API i.e Producer API, Consumer API, Kafka Connect
Topics –

Introduction to Stream Processing
Batch Processing vs Stream Processing
Streaming Processing API in Spark
Overview of Structured Streaming
Notions of Stream Processing
Streaming Sources, Sinks and Output Modes
Streaming data from socket, file as input source
Output Modes (Append, Complete, Update)
Aggregating on Streaming Data
Running SQL Queries on Streaming Data
Processing JSON data using Stream processing
Joining Batch and Streaming Data
Triggers

Advance Stream Processing

Learning Objectives – In this module, you will learn advance Stream Processing concepts
Topics –

Streaming from Kafka as Source and Sink
Stateless vs Stateful transformations
Event time and Windowing
Tumbling Window Aggregate
Sliding Window
Watermarks and Late Data

Course description: This course will help you to learn one of the most powerful, In memory cluster computing framework.

Getting Started with AWS, IAM and S3 Storage

Learning Objectives – In this module, you will learn the fundamentals of AWS, IAM and S3 Storage.
Topics –

Introduction to Cloud Computing
AWS Services, Regions and Zones
Identity and Access Management (IAM)
Create AWS IAM User and Download Credentials
Install and Configure AWS CLI
Introduction to S3 Storage
S3 Objects, Buckets, and Key Value pairs
Create Bucket, Upload Object and Exploring UI
Tagging Bucket and Object
Optimizing S3 Costs
S3 Storage classes
Lifecycle Management
Retrieve Objects from Glacier
Protecting Data
S3 Object Lock
S3 Versioning and Encryption
Setup Cross Origin Replication
AWS Budget Setup – Billing Preferences, Budgets and Alarms
AWS S3 CLI Commands
Event Notification, Logging and Analytics
Billing Preferences, Budgets and Alarms

Big Data Processing with EC2 and EMR

Learning Objectives – In this module, you will understand Amazon EMR Instance and how to launch spark cluster.
Topics –

EC2 Fundamentals
EC2 Security Groups
EC2 Key Pair for SSH Connection
Introduction to EMR
EMR Architecture
EMR Types
EMR Cluster LIfecycle
Optimizing Instance Types
Hands-on : Creating EMR Cluster and Connecting via SSH
Hands-on : Building Packaging and Deploying Spark App
Stream Processing with EMR
Setup Glue Catalog Integration with Hive
Hands-on Query DynamoDB from EMR using Hive

Streaming and Funneling Data with AWS Amazon Kinesis

Learning Objectives – In this course you learn to harness the power of Real-time streaming using Kinesis family of services Kinesis Data Streams (KDS), Kinesis Data Firehose (KDF) and Kinesis Data Analytics (KDA) to construct high-throughput, low latency, pipelines of data across a variety of architectural components leading to scalable and loosely coupled systems.
Topics –

Introduction to Amazon Kinesis
Kinesis Core Services (Streams, Firehose and Analytics)
Kinesis Data Streams Components(Producers, Shards, Consumers)
Creating Kinesis Data Stream using AWS CLI and AWS Console
Publish Records to Kinesis Data Stream
Consume Records from Kinesis Data Stream
Introduction to Kinesis Firehose
Writing Stream Data to S3
Connecting Kinesis Firehose with Kinesis DataStream
Sending Data through AWS CLI
Introduction to Kinesis Data Analytics
Running SQL queries to Process Streaming Data
Connecting a Destination to SQL Stream Processing
Building Data pipeline using Kinesis

Building ETL Data Pipeline using AWS Glue & Athena

Introduction to ETL
AWS Glue Introduction
Components of AWS Glue (Crawler, Data catalog)
Hands-on Developing Data catalog with Glue crawlers
Querying data using Amazon Athena
Glue Jobs
Hands-on Developing Glue Jobs
Running Spark transformation jobs on AWS Glue
Creating Developer end point and running spark code
Glue Catalog Management
Partitioned table creation and maintenance
Hive compatible Partitioning

Serverless Architecture using AWS Lambda

Learning Objectives – In this module, you will learn how to use Amazon’s S3 AWS SDK Java API to work with buckets.
Topics –

What is Serverless Architecture
What is AWS Lambda
Creating First Lambda Function
- Passing Arguments to Lambda Functions
- Passing Environment Variables to Lambda Functions
Monitoring Lambda Function using Cloud Watch
Scheduling Lambda Function using EventBridge
Lambda Versioning
Managing Aliases
S3 Event Notification
Customizing Resources
Introduction to Cloud9
Setup and Develop with Cloud9
Import and Invoke Lambda Functions
Package and Deploy Lambda
Invoke Lambda functions inside API Gateway

Automate AWS Infrastructure

Learning Objectives – In this module, you will understand different concepts of Amazon EC2.
Topics –

Creating EC2 Instance
Introduction to Amazon Kinesis
Amazon Kinesis Core Services
1. Kinesis Streams
2. Kinesis Firehose
3. Kinesis Analytics
Kinesis Streams
Kinesis Streams Key Concepts (Shard, Data Blob, Partition Key etc)
Building A Kinesis Data Stream with AWS CLI Data Generator
Kinesis Producer Library
Hands on – Implementing Kinesis Producer Library
Kinesis Consumer Library
Hands on – Implementing Kinesis Consumer Library
Sending Data to Kinesis Data Stream using Python boto3 library

Funneling Data with Kinesis Firehose + Kinesis Analytics

Learning Objectives – In this module, you will learn how to use Amazon’s Kinesis Firehose and Data Analytics.
Topics –

Introduction to Kinesis Firehose
Writing Stream Data to S3
Connecting Kinesis Firehose with Kinesis DataStream
Sending Data through AWS CLI
Adding Lambda Function
Introduction to Kinesis Data Analytics
Streaming SQL

Course description: In this course you will learn one of the most popular web-based notebook which enables interactive data analytics.

Exploring Azure Databricks

Learning Objectives –
Topics –

Introduction to Azure Databricks
Signup for the Azure Account
Launch Azure Workspace and Databricks Cluster
Upload Data
Databricks Clusters
Databricks Notebook Introduction (Create, Import and Export)
Magic Commands
Databricks Mounts
Develop Spark Application using Azure Databricks
Performing ETL Operations using Azure Databricks
Export and Import Databricks Notebooks
Getting Started with Azure CLI
Create Resource Group using Azure CLI

Accessing Data from Azure Data Lake Storage

Learning Objectives –
Topics –

Create and Upload data to ADLS File System or Container
Mount ADLS on to Azure Databricks to access files from Azure Blob Storage
Creating and ADLS Gen2 Account
Storage Explorer
Accessing via Access Keys
Accessing via SAS Token
Mounting ADLS to DBFS Overview

Exploring Databricks Platform

Learning Objectives – In this module, you will understand Databricks platform
Topics –

Creating Free Community account
Creating Spark Cluster
Creating workspace
Writing applications
Uploading dataset

Course description: In this course you will learn one of the most popular web-based notebook which enables interactive data analytics.

Exploring Azure Databricks

Learning Objectives –
Topics –

Introduction to Azure Databricks
Signup for the Azure Account
Launch Azure Workspace and Databricks Cluster
Upload Data
Databricks Clusters
Databricks Notebook Introduction (Create, Import and Export)
Magic Commands
Databricks Mounts
Develop Spark Application using Azure Databricks
Performing ETL Operations using Azure Databricks
Export and Import Databricks Notebooks
Getting Started with Azure CLI
Create Resource Group using Azure CLI

Accessing Data from Azure Data Lake Storage

Learning Objectives –
Topics –

Create and Upload data to ADLS File System or Container
Mount ADLS on to Azure Databricks to access files from Azure Blob Storage
Creating and ADLS Gen2 Account
Storage Explorer
Accessing via Access Keys
Accessing via SAS Token
Mounting ADLS to DBFS Overview

Exploring Databricks Platform

Learning Objectives – In this module, you will understand Databricks platform
Topics –

Creating Free Community account
Creating Spark Cluster
Creating workspace
Writing applications
Uploading dataset

Candidate Evaluation

We follow assessment and project based approach to make your learning maximized. For each of the module there will be multiple
Assessment/Problem Statements.

Each of the Assessments in the E-Learning helps students to grasp the concepts thought in class and apply in business problem scenarios.

Module Quiz
20%
Hands-on
Exercises 20%
Hands-on
Assignments 60%
Coding
Hackathon
Capstone
Projects

You will have quiz for each of the modules covered in the previous class/week. These tests are usually for 15-20 minute duration.

Each candidate will be given a exercise for evaluation and candidate has to solve.

You will be assigned computational and theoretical homework assignments to be completed

Coding hackathon will be conducted during the middle of the course. This is conducted to test application of concepts to the given problem of statement with tools and techniques that have been covered and to solve a problem quickly, accurately.

At the end of each course there will be a Real-world Capstone Project that enables you to build and end-to-end solution to a real world problems. You will be required to write a project report and present to the audience.

Training Features

110 hours extensive class room
training.
36 sessions of 3 hours each. Course Duration : 4Months

For each of the module multiple Hands-on exercises, assignments and quiz are provided in Google Classroom

We follow Agile Methodology for the project development. Each project will have Feature Study followed by User stories.

There will be a dedicated 1 to 1 interview call between you and a Big Data Architect. Experience a real Mock Interview

We have a community forum for all our students wherein you can enrich your learning through peer interaction.

On completion of the project NPN Training certifies you as a “ Big Data Architect ” based on the project.

Interview Preparation Kit

We solemnly swear to always tell you why your device is up to no good. No vague problem definitions – we’ll tell you the exact issue, and if applicable, which part is faulty before we proceed to fixing the problem.

Interview Preparation Kit

Industry Standard Realtime Project

This program (Big Data Architect Masters Program) comes with a portfolio of industry-relevant POC’s, Use cases and project work.
Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

We follow Agile methodology for the project development.

Each batch will be divided into scrum teams of size 4-5 members.
We will start with a Feature Study before implementing a project.
The Feature will be broken down into User Stories and Tasks.
For each user story a proper Definition Of Done will be defined.
A Test plan will be defined for testing the user story

Mock Data Generator
Building Real time data pipeline
Dynamic Resource Allocation

Project Description

Technologies Used

Project Description

Technologies Used

Project Description

High Level Design

Technologies Used

Project Description

High Level Design

Technologies Used

Project Description

High Level Design

Technologies Used

Project Description

High Level Design

Technologies Used

Upcoming Batches

Online Training
Procedure For Registration
Classroom Training

Mar 20th

Batch: Weekend Sat & Sun
Duration: 4 Months
₹ 30,000

Apr 17th

Batch: Weekend Sat & Sun
Duration: 4 Months
₹ 30,000

1. You have to transfer Rs.1000 towards the registration amount to the below mentioned account details

2. Send screen shot of the payment to info@www.npntraining.com with subject as “Big Data Data Masters Program Pre Registration

3. Once we receive payment , we will be acknowledging the payment through our official email id..

Account Details

Name:	Naveen P.N
Bank Name	State Bank Of India
Account No	64214275988
Account Type	Current Account
IFSC Code	SBIN0040938
Bank Branch	Ramanjaneya Nagar

Send screen shot to : info@www.npntraining.com
Email Subject: Big Data Masters Program Pre Registration
Registration Fees: Rs.1000

Note : Check for the batch availability with Naveen sir before doing the pre-registration.

Sorry Due to Covid situation we are not offering classroom training at present.

Register for Free Demo Class

Experience the Quality of Training

You can sit in actual class and experience the quality of training.
Interact with our previous alumini and get the feedback about the course.
Dont just learn fundamentals go deepen to gain experience.
Do explore our most comprehensive program in Big Data Engineering.

Frequently Asked Questions

What is NPN Training Big Data Masters Program ?

Big Data Architect Masters Program is a structured learning path recommended by leading industry experts and ensures that you transform into an expert Big Data Architect. Being a Big Data Architect requires you to be a master of multitude skills, and this program aims at providing you an in-depth knowledge of the entire Big Data Ecosystem

Why should you enroll for Big Data Masters Program ?

Big Data Architect Learning track has been curated after thorough research and recommendations from industry experts. It will help you differentiate yourself with multi-platform fluency, and have real-world experience with the most important tools and platforms.

What are the prerequisites for the course ?

Basic Programming : As part of the Big Data Architect Masters Program you will be involved in developing Real-time projects which are according to the industry standard, hence having coding knowledge essential, however we will be covering python programming as part of the program.
Linux Basic Commands

Who are the Instructor at NPN Training?

All the Big Data classes will be driven by Naveen sir who is a working professional with more than 12 years of experience in IT as well as teaching.

Can I attend a demo session before enrollment?

Yes, You can sit in actual live class and experience the quality of training.

How will I execute the Practicals?

The practical experience here at NPN Training will be worth and different than that of other training Institutes in Bangalore. Practical knowledge of Big Data can be experienced through our virtual software of Big Data get installed in your machine.

The detailed installation guides is provided in the E-Learning for setting up the environment.

Do I need to bring my own laptop?

NPN Training will provide students with all the course material in hard copies. However, students should carry their individual laptops for the program. Please find the minimum configuration required:

Windows 7 / Mac OS
8 GB RAM is highly preferred
100 GB HDD
64 bit OS

What If I miss a session?

The course validity will be one year so that you can attend the missed session in another batches.

How do I access the E-Learning content for the course?

Once you have registered and paid for the course, you will have 24/7 access to the E-Learning content.

Do I avail EMI option?

The total fees you will be paying in 2 installments

Are there any group discounts for classroom training programs?

Yes, we have group discount options for our training programs. Contact us using the Live Chat link. Our customer service representatives will give you more details.

Certificate of Completion

Earn your certificate

Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.

Differentiate yourself

The knowledge you have gained from working on projects,
videos, quizzes, hands-on assessments and case studies
gives you a competitive edge.

Share your achievement

Highlight your new skills on your resume and LinkedIn. Tell
your friends and colleagues about it.