Understanding Big Data and Hadoop

  • Limitations and Solutions of existing Data Analytics Architecture
  • Hadoop Features
  • Hadoop Ecosystem
  • Hadoop 2.x core components
  • Hadoop Storage: HDFS
  • Hadoop Processing: MapReduce Framework
  • Hadoop Different Distributions

Hadoop Architecture and HDFS

  • Hadoop 2.x Cluster Architecture
  • Federation and High Availability
  • A Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Single node cluster and Multi node cluster set up Hadoop Administration.

Hadoop MapReduce Framework

  • Topics-MapReduce Use Cases
  • Hadoop 2.x MapReduce Architecture
  • YARN MR Application Execution Flow,
  • Anatomy of MapReduce Program
  • Input Splits
  • Relation between Input Splits and HDFS Blocks
  • MapReduce: Combiner & Partitioner
  • Counters ,Distributed Cache
  • MRunit, Reduce Join
  • Custom Input Format
  • Sequence Input Format
  • Xml file Parsing using MapReduce.


  • Hive Background
  • Hive Vs Pig
  • Hive Architecture and Components
  • Metastore in Hive, Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models, Partitions and Buckets, Hive Tables(Managed Tables and External Tables), Importing Data, Querying Data, Managing Outputs, Hive Script, Hive UDF, Retail use case in Hive, Hive Demo on Healthcare Data set.
  • Hive QL: Joining Tables, Dynamic Partitioning
  • Custom Map/Reduce Scripts
  • Hive Indexes and views Hive query optimizers
  • Hive : Thrift Server, User Defined Functions, HBase: Introduction to NoSQL Databases and HBase, HBase v/s RDBMS, HBase Components, HBase Architecture, Run Modes & Configuration, HBase Cluster Deployment.


  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Data Loading Techniques
  • ZooKeeper Data Model
  • Zookeeper Service
  • Zookeeper, Demos on Bulk Loading
  • Getting and Inserting Data, Filters in HBas

Apache Spark & scala

  • What is Apache Spark
  • Spark Ecosystem
  • Spark Components
  • Spark a Polyglot
  • Why Scala
  • SparkContext
  • RDD

Apache Pig

  • About Pig
  • MapReduce Vs Pig
  • Programming Structure in Pig
  • Pig Running Modes
  • Pig components, Pig Execution
  • Pig Latin Program, Data Models in Pig
  • Pig Data Types, Shell and Utility Commands, Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Specialized joins in Pig, Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank, Parameter Substitution ( PIG macros and Pig Parameter substitution ), Pig Streaming, Testing Pig scripts with Punit, Aviation use case in PIG, Pig Demo on Healthcare Data set.

Oozie Sqoop and Flume

  • Flume and Sqoop
  • Oozie Components, Oozie Workflow
  • Scheduling with Oozie
  • Oozie Co-ordinator
  • Oozie Commands, Oozie Web Console
  • Oozie for MapReduce
  • PIG, Hive, and Sqoop, Combine flow of MR, PIG, Hive in Oozie, Hadoop Project Demo, Hadoop Integration with Talend.