+1 302 956 2015 (USA)


Satisfied Learners


Hours Classes





Home   >    All Courses   >   Top Trending Courses   >    Apache Spark and Scala Certification Training

Apache Spark and Scala Certification Training

SUPPORT NO. +1 302 956 2015 (USA)

Apache Spark and Scala Certification Training is designed to provide knowledge and skills to become a successful Spark Developer and prepare you for the Cloudera Certified Associate Spark Hadoop Developer Certification Exam CCA175. You will get in-depth knowledge of concepts such as HDFS, Flume, Sqoop, RDDs, Spark Streaming, MLlib, SparkSQL, Kafka cluster & API by taking this Course.

Why this course ?

  • Spark has overtaken Hadoop as the most active open source Big Data framework - Forbes
  • Apache Spark will dominate the Big Data landscape by 2022 - Wikibon
  • ​The average pay stands at 10​8,​366 USD p.a - ​Indeed.com​​

  • 15K + satisfied learners. Reviews

Enroll now

Instructor-led Sessions

30 hrs of Online Live Instructor-led Classes. Weekend class:10 sessions of 3 hours each and Weekday class:15 sessions of 2 hours each.

Real-life Case Studies

Towards the end of the course, you will be working on a Real Life project.


Each class will be followed by practical assignments which can be completed before the next class.

Lifetime Access

You get lifetime access to the Learning Management System (LMS). Class recordings and presentations can be viewed online from the LMS.

24 x 7 Expert Support

We have 24x7 online support team available to help you with any technical queries you may have during the course.


Towards the end of the course, you will be working on a project. Our Expert certifies you as a Spark Expert based on the project.


We have a community forum for all our customers wherein you can enrich their learning through peer interaction and knowledge sharing.

Apache Spark Certification Training Course is designed to provide knowledge and skills to become a successful Big Data Developer.

You will understand basics of Big Data and Hadoop. You will learn how Spark enables in-memory data processing and runs much faster than Hadoop MapReduce. You will also learn about RDDs, different APIs, which Spark offers such as Spark Streaming, MLlib, Clustering, and Spark SQL. This Certhippo course is an integral part of a Big Data Developer's Career path. It will also encompass the fundamental concepts like data capturing using Flume, data loading using Sqoop, Kafka cluster, Kafka API. 

This course is designed to provide knowledge and skills to become a successful Spark and Hadoop Developer and would help to clear the CCA Spark and Hadoop Developer (CCA175) Examination. 

Market for Big Data analytics is growing across the world and this strong growth pattern translates into a great opportunity for all the IT Professionals. Here are the few Professional IT groups, who are continuously enjoying the benefits moving into Big data domain:

  • Developers and Architects
  • BI /ETL/DW professionals
  • Senior IT Professionals
  • Testing professionals
  • Mainframe professionals
  • Freshers
  • Big Data enthusiasts
  • Software Architects, Engineers and Developers
  • Data Scientists and Analytics professionals

As such, there are no pre-requisites for this course. Knowledge of Scala will definitely be a plus point for learning Spark, but is not mandatory. 

  • Minimum RAM Required: 4GB (Suggested 8 GB)
  • Minimum Free Disk Space: 25GB
  • Minimum Processor i3 or above
  • Operating System of 64bit
  • Student machines must support a 64-bit VirtualBox guest image.

We will help you to setup Certhippo's Virtual Machine in your System with local access. The detailed installation guides are provided in the LMS for setting up the environment. The Certhippo VM has Spark 2.1 and Hadoop 2.8 installed in it with other tools like sqoop, flume, Kafka as well. 

For any doubt, the 24*7 support team will promptly assist you. Certhippo Virtual Machine can be installed on Mac or Windows machine.

Towards the end of the course, you will work on a live project. Following are a few industry-specific case studies that are included in our Apache Spark Developer Certification.

Project #1: US Election

Industry: Government

Technologies Used:

  • HDFS (for storage)
  • Spark SQL (for transformation)
  • Spark MLlib (for machine learning)
  • Zeppelin (for visualization)

Problem Statement : In the US Primary Election 2016, Hillary Clinton was nominated over Bernie Sanders from Democrats and on the other hand, Donald Trump was nominated from Republican Party to contest for the presidential position. As an analyst, you have been tasked to understand different factors that led to the winning of Hillary Clinton and Donald Trump in the primary elections based on demographic features to plan their next initiatives and campaigns.

Project #2: Design a system to replay the real time replay of transactions in HDFS using Spark.

Technology Used :

  • Spark Streaming
  • Kafka (for messaging)
  • HDFS (for storage)
  • Core Spark API (for aggregation)

Project #3: Instant Cabs

Industry: Transportation

Technologies Used :

  • HDFS (for storage)
  • Spark SQL (for transformation)
  • Spark MLlib (for machine learning)
  • Zeppelin (for visualization)

Problem Statement : A US cab service start-up (i.e. Instant cabs) wants to meet the demands in an optimum manner and maximize the profit. Thus, they hired you as a data analyst to interpret the available Uber’s data set and find out the beehive customer pick-up points & peak hours for meeting the demand in a profitable manner.

Project #4: Drop-page of signal during Roaming

Industry: Telecom

Technologies Used :

  • HDFS (for storage)
  • Spark SQL (for transformation)

Problem Statement : You will be given a CDR (Call Details Record) file, you need to find out top 10 customers facing frequent call drops in Roaming. This is a very important report which telecom companies use to prevent customer churn out, by calling them back and at the same time contacting their roaming partners to improve the connectivity issues in specific areas.

Objectives - In this module, you will understand the basics of Scala that are required for programming Spark applications. You will learn about the basic constructs of Scala such as variable types, control structures, collections, and more.


o What is Scala?

o Why Scala for Spark?

o Scala in other frameworks

o Introduction to Scala REPL

o Basic Scala operations

o Variable Types in Scala

o Control Structures in Scala

o Foreach loop, Functions and Procedures

o Collections in Scala- Array

o ArrayBuffer, Map, Tuples, Lists, and more

Hands On:

o Scala REPL Detailed Demo

Objectives - In this module, you will learn about object oriented programming and functional programming techniques in Scala.


o Class in Scala

o Getters and Setters

o Custom Getters and Setters

o Properties with only Getters

o Auxiliary Constructor and Primary Constructor

o Singletons

o Extending a Class

o Overriding Methods

o Traits as Interfaces and Layered Traits

o Functional Programming

o Higher Order Functions

o Anonymous Functions, and more 

Hands On:

o Case Class Demo

o Layered Traits

Objectives - In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, Hadoop ecosystem components, Hadoop Architecture, HDFS, Rack Awareness, and Replication. You will learn about the Hadoop Cluster Architecture, important configuration files in a Hadoop Cluster. You will get an overview of Apache Sqoop and how it is used in importing and exporting tables from RDBMS to HDFS & vice versa.


o What is Big Data?

o Big Data Customer Scenarios

o Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case

o How Hadoop Solves the Big Data Problem

o What is Hadoop?

o Hadoop’s Key Characteristics

o Hadoop Ecosystem and HDFS

o Hadoop Core Components

o Rack Awareness and Block Replication

o Addilearn’s VM Tour

o YARN and Its Advantage

o Hadoop Cluster and Its Architecture

o Hadoop: Different Cluster Modes

o Data Loading using Sqoop


o A Tour of Addilearn’s Hadoop & Spark VM

o Basic Hadoop Commands

o Importing and Exporting Data Using Sqoop

Objectives - In this module, you will understand different frameworks available for Big Data Analytics and the module also includes a first-hand introduction to Spark, demo on Building and Running a Spark Application and Web UI.


o Big Data Analytics with Batch & Real-Time Processing

o Why Spark is Needed?

o What is Spark?

o How Spark Differs from Its Competitors?

o Spark at eBay 

o Spark’s Place in Hadoop Ecosystem

o Spark Components & It’s Architecture

o Running Programs on Scala IDE & Spark Shell

o Spark Web UI

o Configuring Spark Properties

 Hands On:

o Building and Running Spark Application

o Spark Application Web UI

o Configuring Spark Properties

Objectives - In this module, you will learn one of the fundamental building blocks of Spark - RDDs and related manipulations for implementing business logics (Transformations, Actions and Functions performed on RDD). You will learn about Spark applications, how it is developed and configuring Spark properties.


o Challenges in Existing Computing Methods

o Probable Solution & How RDD Solves the Problem

o What is RDD, It’s Functions, Transformations & Actions?

o Data Loading and Saving Through RDDs

o Key-Value Pair RDDs and Other Pair RDDs

o RDD Lineage 

o RDD Persistence

o WordCount Program Using RDD Concepts

o RDD Partitioning & How It Helps Achieve Parallelization

Hands On:

o Loading data in RDDs

o Saving data through RDDs

o RDD Transformations

o RDD Actions and Functions

o RDD Partitions

o WordCount through RDDs

Objectives - In this module, you will learn about Spark SQL which is used to process structured data with SQL queries. You will learn about data-frames and datasets in Spark SQL and perform SQL operations on data-frames.


o Need for Spark SQL

o What is Spark SQL?

o Spark SQL Architecture

o SQL Context in Spark SQL

o Data Frames & Datasets

o Interoperating with RDDs

o JSON and Parquet File Formats

o Loading Data through Different Sources

Hands On: 

o Spark SQL – Creating data frames 

o Loading and transforming data through different sources

o Stock Market Analysis

Objectives – In this module you will learn about what is the need for machine learning, types of ML concepts, clustering and MLlib (i.e. Spark’s machine learning library), various algorithms supported by MLlib and implement K-Means Clustering.


o What is Machine Learning?

o Where is Machine Learning Used?

o Different Types of Machine Learning Techniques

o Face Detection: USE CASE

o Understanding MLlib

o Features of MLlib and MLlib Tools

o Various ML algorithms supported by MLlib 

o K-Means Clustering & How It Works with MLlib

o Analysis on US Election Data: K-Means MLlib USE CASE  

Hands On: 

o Machine Learning MLlib

o K- Means Clustering 

Objectives - In this module, you will understand Kafka and Kafka Architecture. Afterwards you will go through the details of Kafka Cluster and you will also learn how to configure different types of Kafka Cluster.


o Need for Kafka

o What is Kafka? 

o Core Concepts of Kafka

o Kafka Architecture

o Where is Kafka Used?

o Understanding the Components of Kafka Cluster

o Configuring Kafka Cluster

o Producer and Consumer

Hands On: 

o Configuring Single Node Single Broker Cluster

o Configuring Single Node Multi Broker Cluster

Objectives – In this module you will get an introduction to Apache Flume and its basic architecture and how it is integrated with Apache Kafka for event processing.


o Need of Apache Flume

o What is Apache Flume?

o Basic Flume Architecture

o Flume Sources

o Flume Sinks

o Flume Channels

o Flume Configuration

o Integrating Apache Flume and Apache Kafka

Hands On: 

o Flume Commands

o Setting up Flume Agent

o Streaming Twitter Data into HDFS

Objectives – In this module you will get an opportunity to work on Spark streaming which is used to build scalable fault-tolerant streaming applications. You will learn about DStreams and various Transformations performed on it. You will get to know about main streaming operators, Sliding Window Operators and Stateful Operators.


o Drawbacks in Existing Computing Methods

o Why Streaming is Necessary? 

o What is Spark Streaming?

o Spark Streaming Features

o Spark Streaming Workflow

o How Uber Uses Streaming Data

o Streaming Context & DStreams

o Transformations on DStreams

o WordCount Program using Spark Streaming

o Describe Windowed Operators and Why it is Useful

o Important Windowed Operators

o Slice, Window and ReduceByWindow Operators

o Stateful Operators

o Perform Twitter Sentimental Analysis Using Spark Streaming

Hands On:

• Creating DStreams

• Transactions and Actions performed on DStreams.

• Output Operations in DStreams

• Sliding Window Operations

• Stateful Operations

• Twitter Sentimental Analysis

You will never miss a lecture at Certhippo! You can choose either of the two options:

  • View the recorded session of the class available in your LMS.
  • You can attend the missed session, in any other live batch.

Certhippo is committed to provide you an awesome learning experience through world-class content and best-in-class instructors. We will create an ecosystem through this training, that will enable you to convert opportunities into job offers by presenting your skills at the time of an interview. We can assist you in resume building and also share important interview questions once you are done with the training. However, please understand that we are not into job placements.

We have limited number of participants in a live session to maintain the Quality Standards. So, unfortunately participation in a live class without enrollment is not possible. However, you can go through the sample class recording and it would give you a clear insight about how are the classes conducted, quality of instructors and the level of interaction in a class.

All the instructors at Certhippo are practitioners from the Industry with minimum 10-12 yrs of relevant IT experience. They are subject matter experts and are trained by Certhippo for providing an awesome learning experience to the participants.

    • Once you are successfully through the project (Reviewed by a Certhippo expert), you will be awarded with Certhippoa’s Apache Spark Certificate.
    • Certhippo certification has industry recognition and we are the preferred training partner for many MNCs e.g.Cisco, Ford, Mphasis, Nokia, Wipro, Accenture, IBM, Philips, Citi, Ford, Mindtree, BNYMellon etc. Please be assured.