Big Data Architect Masters Program

Building Data Pipelines using Cloud

Naveen - Trainer

The Big Data Architect Masters Program is designed for professionals who are seeking to deepen their knowledge in the field of Big Data. The program is customized based on current industry standards and designed to help you gain end to end coverage of Big Data technologies. The program is curated by industry experts to provide hands-on training with tools that are used widely in industries across Big Data domain.

UPCOMING BATCHES

Connect With Us

Program Highlight

5000+
Alumni Students

Expertly Designed
Curriculum

110+ Hrs Training​

Practical experience through real-life projects

About the Program

The Big Data Architect Masters Program is designed to empower professionals to develop relevant competencies and accelerate their career progression in Big Data technologies through complete Hands-on training.

Being a Big Data Developer requires you to learn multiple technologies, and this program will ensure you to become an industry-ready Big Data Architect who can provide solutions to Big Data projects.

At NPN Training we believe in the philosophy “Learn by doing” hence we provide complete Hands-on training with a real time project development.

Program Structure

Most Meticulously designed Data Engineering program

  • Course Description

    This course will help you understand how Hadoop, Spark and its eco-system solves storage and processing of large data sets in a distributed environment.

    Distributed Storage & Batch Processing
    using Hadoop and Spark

    1 Capstone Project

  • Course Description

    In this course, you will learn one of the most popular enterprise messaging and streaming platform, you will learn the basics of creating an event-driven system using Apache Kafka and Spark Structured Streaming and the ecosystem around it.

    Event Driven Architecture and Streaming using
    Kafka and Structured Streaming  

    1 Capstone Project

  • Course Description

    Amazon Web Services (AWS) offer a unique opportunity to build out scalable, robust, and highly-available systems in the cloud. This course will give you an overview of all of the different services that you can leverage within AWS to build out a Big Data solution.

    Data Engineering with AWS Data Analytics Services

    1 Capstone Project

  • Course Description

    In this course, you will learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.

    Data Engineering with Microsoft
    Azure Databricks

    1 Capstone Project

  • Course Description

    This course will provide you to additional tools and framework to become a successful data engineer.

    Data Engineering Essentials Add-on Modules

    1 Capstone Project

Comprehensive Curriculum

Tried & tested curriculum to make you a solid Data Engineer

Course description: This course will help you to learn one of the most powerful, In memory cluster computing framework.

Data Engineering Concepts and Hadoop 2.x (YARN)
 1 Quiz         

Learning Objectives – This module introduces you to the core concepts, processes, and tools you need to know in order to get a foundational knowledge of data engineering. You will gain an understanding of the modern data ecosystem and the role Data Engineers, Data Scientists, and Data Analysts play in this ecosystem.

Topics –

  • Introduction to Big Data (Classification and Characteristics)
  • Drawbacks of RDBMS
  • Challenges of Big Data
  • Sources of Big Data
  • Distributed Systems
  • What is Data Engineering, Roles and Responsibilities
  • Data Pipelines and types
  • Hadoop 2.x and Core Components
  • Hadoop Daemons
  • Hadoop Architecture
  • HDFS File Blocks and Architecture
  • Introduction to YARN and Architecture
  • YARN Application Execution Flow

Hadoop Commands and Configurations
 1 Quiz           1 Assignment

Learning ObjectivesIn this module, you will learn different commands to work with HDFS File System, YARN Commands and how to execute and monitor jobs.
Topics –

  • Exploring Important Configuration Files
  • Exploring HDFS File System Commands
  • Exploring Hadoop Admin Commands
  • Exploring YARN Commands
    • Executing YARN Jobs
    • Monitoring YARN Jobs
    • Monitoring different -appTypes
    • Killing YARN Jobs
  • Exploring Name Node and Resource Manager UI

Structured Data Analysis using Hive
 1 Quiz           1 Assignment

Learning ObjectivesIn this module, you will understand Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts and Hive UDF.
Topics –

  • Introduction to Hive and Architecture
  • Exploring Hive table types
  • Data loading techniques in Hive
  • Hive Complex Data types (Array, Map,
  • tructType)

Getting Started with Apache Spark + RDD’s
 1 Quiz           2 Assignments

Learning Objectives – In this module, you will learn about spark architecture in comparison with Hadoop Ecosystem and you will learn one of the fundamental building blocks of Spark – RDDs and related manipulations for implementing business logic (Transformations, Actions and Functions performed on RDD).
Topics –

  • Overview of Apache Spark
  • Data sharing in MapReduce vs Spark
  • Exploring Spark Ecosystem
  • Exploring RDD’s : Basic Building Block
  • Partitions
  • Starting Spark Shell
  • RDD Creations
    1. Loading a file
    2. Parallelize Collections
  • Exploring RDD Operations
    1. Transformations
    2. Actions
  • RDD Actions
    1. count()
    2. first()
    3. take(int)
    4. saveAsTextFile(path:String)
    5. reduce(func)
    6. collect(func)
  • RDD Transformations
    1. map(func)
    2. foreach(func)
    3. filter(func)
  • Chaining Transformation and Actions in Spark
  • Configuring Development environment
  • Initializing SparkSession i.e Spark 2.x entry point
  • Pair RDD
  • Sorting Grouping and Aggregations

Exploring Spark SQL and DataFrame API
 1 Quiz           1 Assignment

Learning Objectives – In this module, you will learn about Spark SQL which is used to process structured data with SQL queries. You will learn about DataFrames and Datasets in Spark SQL and perform SQL operations on DataFrames.

Topics –

  • Introduction to Spark SQL
  • Overview of DataFrames
  • Creating DataFrames (In-Memory + External Source)
  • Exploring DataFrameReader API
  • Attaching Custom Schema
  • Working with Columns
  • Filtering
  • Adding, Renaming, Dropping column
  • Grouping, Sorting and Aggregations
  • Registering DataFrame as a Table
  • Join Operations
  • Understanding DataFrameWriter API

Deep Dive Dive DataFrame API
 1 Quiz           1 Assignment

Learning Objectives – In this module, you will learn some of the advance concepts of DataFrame API

Topics –

  • Complex Data Types
  • Handling Corrupt and Missing Records
  • Working with Dates
  • User Defined Functions (UDF)
  • Connect to DB via DataFrame
  • Window Functions

Packaging, Deploying and Debugging
 1 Quiz         

Learning Objectives –In this module, you will learn what are the different aspects to take care to deploy and improve Spark applications.

Topics –

  • Packaging Spark Application
  • Submitting Spark Application using spark-submit command
  • Deployment Modes
  • Configuring Spark on YARN
  • Monitoring Spark applications on YARN

Best Practices and Performance Tuning
 1 Quiz         

Learning Objectives –In this module, you will learn what are the different aspects to take care to deploy and improve Spark applications.

Topics –

  • Caching and Persistence
  • Broadcast Variables
  • Optimizing Spark Joins
  • Memory Partitioning

Course description: Apache Kafka is a popular tool used in many big data analytics projects to get data from other systems into big data system. Through this course students can develop Apache Kafka applications that send and receive data from Kafka clusters. By the end of this course, you will be able to set up a personal Kafka development environment, master the concepts of topics, partitions and consumer groups, develop a Kafka producer to send messages and develop a Kafka consumer to receive messages. You will also practice the Kafka command line interfaces for producing and consuming messages.

Module 01 – Getting Started with Kafka and Core API’s
 1 Quiz           1 Assignment

Learning Objectives – In this module, you will understand Kafka and Kafka Architecture.
Topics –

  • Integration between components
  • What is Kafka
  • Components of Messaging System
  • Understanding Kafka components in detail
    1. Producer
    2. Consumer
    3. Broker
    4. Cluster
    5. Topic
    6. Partitions
    7. Offset
    8. Consumer groups
  • Message Retention in kafka
  • Kafka Commit Log
  • Kafka
    1. Starting Zookeeper
    2. Starting Kafka Server
    3. Topic operations: create, list, delete, describe
    4. Publishing data to a topic using console producer
    5. Publishing data to a topic using console consumer
    6. Sending and receiving messages
  • Hands on – Kafka Cluster with Multiple Brokers
    1. Creating separate configuration files for brokers
    2. Launching multiple brokers
    3. Getting cluster information and broker details from Zookeeper
  • Hands on – Topic with multiple partitions
    1. Creating topic with multiple partitions
    2. How messages are spread across partitions
    3. Reading messages from specific partitions
    4. Reading messages from specific offset in specific partition
  • Understanding Kafka Core API’s
  • Implementing Kafka Producer & Consumer

Deep Dive Kafka Producer and Consumer API
 1 Quiz           1 Assignment

Learning Objectives – In this module, you will understand you will learn advance Kafka Core API i.e Producer API, Consumer API, Kafka Connect
Topics –

  • Understanding Producer Partitioning Mechanism using Java
  • Different ways to implement partitioning mechanism
    • Providing partition number
    • Using Round Robin
    • Key Hashing
  • Messaging Sending
  • Producer API
    • Synchronous Send
    • ASynchronous Send

Streaming Data with Spark Structured Streaming
 1 Quiz           1 Assignment

Learning Objectives – In this module, you will understand you will learn advance Kafka Core API i.e Producer API, Consumer API, Kafka Connect
Topics –

  • Introduction to Stream Processing
  • Batch Processing vs Stream Processing
  • Streaming Processing API in Spark
  • Overview of Structured Streaming
  • Notions of Stream Processing
  • Streaming Sources, Sinks and Output Modes
  • Streaming data from socket, file as input source
  • Output Modes (Append, Complete, Update)
  • Aggregating on Streaming Data
  • Running SQL Queries on Streaming Data
  • Processing JSON data using Stream processing
  • Joining Batch and Streaming Data
  • Triggers

Advance Stream Processing
 1 Quiz           1 Assignment

Learning Objectives – In this module, you will learn advance Stream Processing concepts
Topics –

  • Streaming from Kafka as Source and Sink
  • Stateless vs Stateful transformations
  • Event time and Windowing
  • Tumbling Window Aggregate
  • Sliding Window
  • Watermarks and Late Data

Course description: This course will help you to learn one of the most powerful, In memory cluster computing framework.

Getting Started with AWS, IAM and S3 Storage
 1 Quiz         

Learning Objectives – In this module, you will learn the fundamentals of AWS, IAM and S3 Storage.
Topics –

  • Introduction to Cloud Computing
  • AWS Services, Regions and Zones
  • Identity and Access Management (IAM)
  • Create AWS IAM User and Download Credentials
  • Install and Configure AWS CLI
  • Introduction to S3 Storage
  • S3 Objects, Buckets, and Key Value pairs
  • Create Bucket, Upload Object and Exploring UI
  • Tagging Bucket and Object
  • Optimizing S3 Costs
  • S3 Storage classes
  • Lifecycle Management
  • Retrieve Objects from Glacier
  • Protecting Data
  • S3 Object Lock
  • S3 Versioning and Encryption
  • Setup Cross Origin Replication
  • AWS Budget Setup – Billing Preferences, Budgets and Alarms
  • AWS S3 CLI Commands
  • Event Notification, Logging and Analytics
  • Billing Preferences, Budgets and Alarms

Big Data Processing with EC2 and EMR
 1 Quiz           1 Assignments

Learning Objectives – In this module, you will understand Amazon EMR Instance and how to launch spark cluster.
Topics –

  • EC2 Fundamentals
  • EC2 Security Groups
  • EC2 Key Pair for SSH Connection
  • Introduction to EMR
  • EMR Architecture
  • EMR Types
  • EMR Cluster LIfecycle
  • Optimizing Instance Types
  • Hands-on : Creating EMR Cluster and Connecting via SSH
  • Hands-on : Building Packaging and Deploying Spark App
  • Stream Processing with EMR
  • Setup Glue Catalog Integration with Hive
  • Hands-on Query DynamoDB from EMR using Hive

Streaming and Funneling Data with AWS Amazon Kinesis
 1 Quiz           1 Assignment

Learning ObjectivesIn this course you learn to harness the power of Real-time streaming using Kinesis family of services Kinesis Data Streams (KDS), Kinesis Data Firehose (KDF) and Kinesis Data Analytics (KDA)  to construct high-throughput, low latency, pipelines of data across a variety of architectural components leading to scalable and loosely coupled systems.
Topics –

  • Introduction to Amazon Kinesis
  • Kinesis Core Services (Streams, Firehose and Analytics)
  • Kinesis Data Streams Components(Producers, Shards, Consumers)
  • Creating Kinesis Data Stream using AWS CLI and AWS Console
  • Publish Records to Kinesis Data Stream
  • Consume Records from Kinesis Data Stream
  • Introduction to Kinesis Firehose
  • Writing Stream Data to S3
  • Connecting Kinesis Firehose with Kinesis DataStream
  • Sending Data through AWS CLI
  • Introduction to Kinesis Data Analytics
  • Running SQL queries to Process Streaming Data
  • Connecting a Destination to SQL Stream Processing
  • Building Data pipeline using Kinesis

Building ETL Data Pipeline using AWS Glue & Athena
 1 Quiz           1 Assignment

Learning ObjectivesIn this course you learn to harness the power of Real-time streaming using Kinesis family of services Kinesis Data Streams (KDS), Kinesis Data Firehose (KDF) and Kinesis Data Analytics (KDA)  to construct high-throughput, low latency, pipelines of data across a variety of architectural components leading to scalable and loosely coupled systems.
Topics –

  • Introduction to ETL
  • AWS Glue Introduction
  • Components of AWS Glue (Crawler, Data catalog)
  • Hands-on Developing Data catalog with Glue crawlers
  • Querying data using Amazon Athena
  • Glue Jobs
  • Hands-on Developing Glue Jobs
  • Running Spark transformation jobs on AWS Glue
  • Creating Developer end point and running spark code
  • Glue Catalog Management
  • Partitioned table creation and maintenance
  • Hive compatible Partitioning

Serverless Architecture using AWS Lambda
 1 Quiz           1 Assignment

Learning Objectives – In this module, you will learn how to use Amazon’s S3 AWS SDK Java API to work with buckets.
Topics –

  • What is Serverless Architecture
  • What is AWS Lambda
  • Creating First Lambda Function
    • Passing Arguments to Lambda Functions
    • Passing Environment Variables to Lambda Functions
  • Monitoring Lambda Function using Cloud Watch
  • Scheduling Lambda Function using EventBridge
  • Lambda Versioning
  • Managing Aliases
  • S3 Event Notification
  • Customizing Resources
  • Introduction to Cloud9
  • Setup and Develop with Cloud9
  • Import and Invoke Lambda Functions
  • Package and Deploy Lambda
  • Invoke Lambda functions inside API Gateway

Automate AWS Infrastructure
 1 Quiz           1 Assignment

Learning Objectives – In this module, you will understand different concepts of Amazon EC2.
Topics –
  • Creating EC2 Instance
  • Introduction to Amazon Kinesis
  • Amazon Kinesis Core Services
    1. Kinesis Streams
    2. Kinesis Firehose
    3. Kinesis Analytics
  • Kinesis Streams
  • Kinesis Streams Key Concepts (Shard, Data Blob, Partition Key etc)
  • Building A Kinesis Data Stream with AWS CLI Data Generator
  • Kinesis Producer Library
  • Hands on – Implementing Kinesis Producer Library
  • Kinesis Consumer Library
  • Hands on – Implementing Kinesis Consumer Library
  • Sending Data to Kinesis Data Stream using Python boto3 library

Funneling Data with Kinesis Firehose + Kinesis Analytics
 3 Hours           2 Assignment

Learning Objectives – In this module, you will learn how to use Amazon’s Kinesis Firehose and Data Analytics.
Topics –

  • Introduction to Kinesis Firehose
  • Writing Stream Data to S3
  • Connecting Kinesis Firehose with Kinesis DataStream
  • Sending Data through AWS CLI
  • Adding Lambda Function
  • Introduction to Kinesis Data Analytics
  • Streaming SQL

Course description: In this course you will learn one of the most popular web-based notebook which enables interactive data analytics.

Exploring Azure Databricks
 3 Hours           2 Assignments

Learning Objectives – 
Topics –

  • Introduction to Azure Databricks
  • Signup for the Azure Account
  • Launch Azure Workspace and Databricks Cluster
  • Upload Data
  • Databricks Clusters
  • Databricks Notebook Introduction (Create, Import and Export)
  • Magic Commands
  • Databricks Mounts
  • Develop Spark Application using Azure Databricks
  • Performing ETL Operations using Azure Databricks
  • Export and Import Databricks Notebooks
  • Getting Started with Azure CLI
  • Create Resource Group using Azure CLI

Accessing Data from Azure Data Lake Storage
 3 Hours           2 Assignments

Learning Objectives – 
Topics –

  • Create and Upload data to ADLS File System or  Container
  • Mount ADLS on to Azure Databricks to access files from Azure Blob Storage
  • Creating and ADLS Gen2 Account
  • Storage Explorer
  • Accessing via Access Keys
  • Accessing via SAS Token
  • Mounting ADLS to DBFS Overview

Exploring Databricks Platform
 3 Hours           2 Assignments

Learning Objectives – In this module, you will understand Databricks platform
Topics –

  • Creating Free Community account
  • Creating Spark Cluster
  • Creating workspace
  • Writing applications
  • Uploading dataset 

Course description: In this course you will learn one of the most popular web-based notebook which enables interactive data analytics.

Exploring Azure Databricks
 3 Hours           2 Assignments

Learning Objectives – 
Topics –

  • Introduction to Azure Databricks
  • Signup for the Azure Account
  • Launch Azure Workspace and Databricks Cluster
  • Upload Data
  • Databricks Clusters
  • Databricks Notebook Introduction (Create, Import and Export)
  • Magic Commands
  • Databricks Mounts
  • Develop Spark Application using Azure Databricks
  • Performing ETL Operations using Azure Databricks
  • Export and Import Databricks Notebooks
  • Getting Started with Azure CLI
  • Create Resource Group using Azure CLI

Accessing Data from Azure Data Lake Storage
 3 Hours           2 Assignments

Learning Objectives – 
Topics –

  • Create and Upload data to ADLS File System or  Container
  • Mount ADLS on to Azure Databricks to access files from Azure Blob Storage
  • Creating and ADLS Gen2 Account
  • Storage Explorer
  • Accessing via Access Keys
  • Accessing via SAS Token
  • Mounting ADLS to DBFS Overview

Exploring Databricks Platform
 3 Hours           2 Assignments

Learning Objectives – In this module, you will understand Databricks platform
Topics –

  • Creating Free Community account
  • Creating Spark Cluster
  • Creating workspace
  • Writing applications
  • Uploading dataset 

Candidate Evaluation

We follow assessment and project based approach to make your learning maximized. For each of the module there will be multiple
Assessment/Problem Statements.

Each of the Assessments in the E-Learning helps students to grasp the concepts thought in class and apply in business problem scenarios.

You will have quiz for each of the modules covered in the previous class/week. These tests are usually for 15-20 minute duration.

Each candidate will be given a exercise for evaluation and candidate has to solve.

You will be assigned computational and theoretical homework assignments to be completed

Coding hackathon will be conducted during the middle of the course. This is conducted to test application of concepts to the given problem of statement with tools and techniques that have been covered and to solve a problem quickly, accurately.

At the end of each course there will be a Real-world Capstone Project that enables you to build and end-to-end solution to a real world problems. You will be required to write a project report and present to the audience.

Training Features

Course Duration

110 hours extensive class room
training.
36 sessions of 3 hours each. Course Duration : 4Months

Assignment

For each of the module multiple Hands-on exercises, assignments and quiz are provided in Google Classroom

Project work

We follow Agile Methodology for the project development. Each project will have Feature Study followed by User stories.

Mock Interview

There will be a dedicated 1 to 1 interview call between you and a Big Data Architect. Experience a real Mock Interview

Forum

We have a community forum for all our students wherein you can enrich your learning through peer interaction.

Certification

On completion of the project NPN Training certifies you as a “ Big Data Architect ” based on the project.

Interview Preparation Kit

Interview Preparation Kit

We solemnly swear to always tell you why your device is up to no good. No vague problem definitions – we’ll tell you the exact issue, and if applicable, which part is faulty before we proceed to fixing the problem.

Interview Preparation Kit

We solemnly swear to always tell you why your device is up to no good. No vague problem definitions – we’ll tell you the exact issue, and if applicable, which part is faulty before we proceed to fixing the problem.

Interview Preparation Kit

We solemnly swear to always tell you why your device is up to no good. No vague problem definitions – we’ll tell you the exact issue, and if applicable, which part is faulty before we proceed to fixing the problem.

Industry Standard Realtime Project

This program (Big Data Architect Masters Program) comes with a portfolio of industry-relevant POC’s, Use cases and project work.
Unlike other institutes we don’t say use cases as a project, we clearly distinguish between use case and Project.

We follow Agile methodology for the project development.

Upcoming Batches

Mar 20th

Batch: Weekend Sat & Sun
Duration: 4 Months
₹ 30,000

Apr 17th

Batch: Weekend Sat & Sun
Duration: 4 Months
₹ 30,000

1. You have to transfer Rs.1000 towards the registration amount to the below mentioned account details

2. Send screen shot of the payment to info@www.npntraining.com with subject as “Big Data Data Masters Program Pre Registration

3. Once we receive payment , we will be acknowledging the payment through our official email id..

Account Details

Name:  Naveen P.N
Bank Name  State Bank Of India
Account No 64214275988
Account Type  Current Account
IFSC Code SBIN0040938
Bank Branch Ramanjaneya Nagar

Send screen shot to : info@www.npntraining.com
Email Subject: Big Data Masters Program Pre Registration
Registration Fees: Rs.1000

Note : Check for the batch availability with Naveen sir before doing the pre-registration.

Sorry 🙁 Due to Covid situation we are not offering classroom training at present.

Register for Free Demo Class

Experience the Quality of Training

Frequently Asked Questions

What is NPN Training Big Data Masters Program ?
Big Data Architect Masters Program is a structured learning path recommended by leading industry experts and ensures that you transform into an expert Big Data Architect. Being a Big Data Architect requires you to be a master of multitude skills, and this program aims at providing you an in-depth knowledge of the entire Big Data Ecosystem
Why should you enroll for Big Data Masters Program ?

Big Data Architect Learning track has been curated after thorough research and recommendations from industry experts. It will help you differentiate yourself with multi-platform fluency, and have real-world experience with the most important tools and platforms.

What are the prerequisites for the course ?
  1. Basic Programming : As part of the Big Data Architect Masters Program you will be involved in developing Real-time projects which are according to the industry standard, hence having coding knowledge essential, however we will be covering python programming as part of the program.
  2. Linux Basic Commands

 

Who are the Instructor at NPN Training?

All the Big Data classes will be driven by Naveen sir who is a working professional with more than 12 years of experience in IT as well as teaching.

Can I attend a demo session before enrollment?

Yes, You can sit in actual live class and experience the quality of training.

How will I execute the Practicals?

The practical experience here at NPN Training will be worth and different than that of other training Institutes in Bangalore. Practical knowledge of Big Data can be experienced through our virtual software of Big Data get installed in your machine.

The detailed installation guides is provided in the E-Learning for setting up the environment.

Do I need to bring my own laptop?

NPN Training will provide students with all the course material in hard copies. However, students should carry their individual laptops for the program. Please find the minimum configuration required:

Windows 7 / Mac OS
8 GB RAM is highly preferred
100 GB HDD
64 bit OS

What If I miss a session?

The course validity will be one year so that you can attend the missed session in another batches.

How do I access the E-Learning content for the course?

Once you have registered and paid for the course, you will have 24/7 access to the E-Learning content.

Do I avail EMI option?

The total fees you will be paying in 2 installments

Are there any group discounts for classroom training programs?

Yes, we have group discount options for our training programs. Contact us using  the Live Chat link. Our customer service representatives will give you more details.

Certificate of Completion

Earn your certificate

Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.

Differentiate yourself

The knowledge you have gained from working on projects,
videos, quizzes, hands-on assessments and case studies
gives you a competitive edge.

Share your achievement

Highlight your new skills on your resume and LinkedIn. Tell
your friends and colleagues about it.

Your learning is important. It’s a good idea to go through reviews of previous students to make an informed decision

Blog Posts

In this blog post , you will learn different ways to handle Nulls in Apache Spark..

In this blog post I will explain different HDFS commands to access HDFS..

In this blog post we will learn how to convert RDD to DataFrame with spark..

Close Navigation