Thursday 26 March 2015

Training content

Hadoop Course Content

(Development and Administration)

 

Introduction to Big Data and Hadoop

v  Big Data
n  What is Big Data?
n  Why all industries are talking about Big Data?
n  What are the issues in Big Data?
§  Storage
§  What are the challenges for storing big data?
§  Processing
§  What are the challenges for processing big data?
n  What are the technologies support big data?
§  Hadoop
§  Data Bases
§  Traditional
§  NO SQL
v  Hadoop
n  What is Hadoop?
n  History of Hadoop
n  Why Hadoop?
n  Hadoop Use cases
n  Advantages and Disadvantages of Hadoop
v  Importance of Different Ecosystems of Hadoop
v  Importance of Integration with other BigData solutions
v  Big Data Real time Use Cases

HDFS (Hadoop Distributed File System)

v  HDFS architecture
o    Name Node
§  Importance of Name Node
§  What are the roles of Name Node
§  What are the drawbacks in Name Node
o    Secondary Name Node
§  Importance of Secondary Name Node
§  What are the roles of Secondary Name Node
§  What are the drawbacks in Secondary Name Node
o    Data Node
§  Importance of Data Node
§  What are the roles of Data Node
§  What are the drawbacks in Data Node
v  Data Storage in HDFS
o    How blocks are storing in DataNodes
o    How replication works in Data Nodes
o    How to write the files in HDFS
o    How to read the files in HDFS
v  HDFS Block size
o    Importance of HDFS Block size
o    Why Block size is so large?
o    How it is related to MapReduce split size
v  HDFS Replication factor
o    Importance of HDFS Replication factor in production environment
o    Can we change the replication for a particular file or folder
o    Can we change the replication for all files or folders
v  Accessing HDFS
o    CLI(Command Line Interface) using hdfs commands
o    Java Based Approach
v  HDFS Commands
o    Importance of each command
o    How to execute the command
o    Hdfs admin related commands explanation
v  Configurations
o    Can we change the existing configurations of hdfs or not?
o    Importance of configurations
v  How to overcome the Drawbacks in HDFS
o    Name Node failures
o    Secondary Name Node failures
o    Data Node failures
v  Where does it fit and Where doesn’t fit?
v  Exploring the Apache HDFS Web UI
v  How to configure the Hadoop Cluster
o    How to add the new nodes ( Commissioning )
o    How to remove the existing nodes ( De-Commissioning )
o    How to verify the Dead Nodes
o    How to start the Dead Nodes
v  Hadoop 2.x.x version features
o    Introduction to Namenode fedoration
o    Introduction to Namenode High Availabilty
v  Difference between Hadoop 1.x.x and Hadoop 2.x.x versions

MAPREDUCE

v  Map Reduce architecture
o    JobTracker
§  Importance of JobTracker
§  What are the roles of JobTracker
§  What are the drawbacks in JobTracker
o    TaskTracker
§  Importance of TaskTracker
§  What are the roles of TaskTracker
§  What are the drawbacks in TaskTracker
o    Map Reduce Job execution flow
v  Data Types in Hadoop
o    What are the Data types in Map Reduce
o    Why these are importance in Map Reduce
o    Can we write custom Data Types in MapReduce
v  Input Format's in Map Reduce
o    Text Input Format
o    Key Value Text Input Format
o    Sequence File Input Format
o    NLine Input Format
o    Importance of Input Format in Map Reduce
o    How to use Input Format in Map Reduce
o    How to write custom Input Format's and its Record Readers
v  Output Format's in Map Reduce
o    Text Output Format
o    Sequence File Output Format
o    Importance of Output Format in Map Reduce
o    How to use Output Format in Map Reduce
o    How to write custom Output Format's and its Record Writers
v  Mapper
o    What is mapper in Map Reduce Job
o    Why we need mapper?
o    What are the Advantages and Disadvantages of  mapper
o    Writing mapper programs
v  Reducer
o    What is reducer in Map Reduce Job
o    Why we need reducer ?
o    What are the Advantages and Disadvantages of  reducer
o    Writing reducer programs
v  Combiner
o    What is combiner in Map Reduce Job
o    Why we need combiner?
o    What are the Advantages and Disadvantages of Combiner
o    Writing Combiner programs
v  Partitioner
o    What is Partitioner in Map Reduce Job
o    Why we need Partitioner?
o    What are the Advantages and Disadvantages of Partitioner
o    Writing Partitioner programs
v  Distributed Cache
o    What is Distributed Cache in Map Reduce Job
o    Importance of  Distributed Cache in Map Reduce job
o    What are the Advantages and Disadvantages of Distributed Cache
o    Writing Distributed Cache programs
v  Counters
o    What is Counter in Map Reduce Job
o    Why we need Counters in production environment?
o    How to Write Counters in Map Reduce programs
v  Importance of Writable and Writable Comparable Api’s
o    How to write custom Map Reduce Keys using Writable
o    How to write custom Map Reduce Values using  Writable Comparable
v  Joins
o    Map Side Join
§  What is the importance of Map Side Join
§  Where we are using it
o    Reduce Side Join
§  What is the importance of Reduce Side Join
§  Where we are using it
o    What is the difference between Map Side join and Reduce Side Join?
v  Compression techniques
o    Importance of Compression techniques in production environment
o    Compression Types
§  NONE, RECORD and BLOCK
o    Compression Codecs
§  Default, Gzip, Bzip, Snappy and LZO
o    Enabling and Disabling these techniques for all the Jobs
o    Enabling and Disabling these techniques for a particular Job
v  Map Reduce Schedulers
o    FIFO Scheduler
o    Capacity Scheduler
o    Fair Scheduler
o    Importance of Schedulers in production environment
o    How to use Schedulers in production environment
v  Map Reduce Programming Model
o    How to write the Map Reduce jobs in Java
o    Running the Map Reduce jobs in local mode
o    Running the Map Reduce jobs in pseudo mode
o    Running the Map Reduce jobs in cluster mode
v  Debugging  Map Reduce Jobs
o    How to debug Map Reduce Jobs in Local Mode.
o    How to debug Map Reduce Jobs in Remote Mode.
v  YARN (Next Generation Map Reduce)
o    What is YARN?
o    What is the importance of YARN?
o    Where we can use the concept of YARN in Real Time
o    What is difference between YARN and Map Reduce
v  Data Locality
o    What is Data Locality?
o    Will Hadoop follows Data Locality?
v  Speculative Execution
o    What is Speculative  Execution?
o    Will Hadoop follows Speculative Execution?
v  Map Reduce Commands
o    Importance of each command
o    How to execute the command
o    Mapreduce admin related commands explanation
v  Configurations
o    Can we change the existing configurations of mapreduce or not?
o    Importance of configurations
v  Writing Unit Tests for Map Reduce Jobs
v  Configuring hadoop development environment using Eclipse
v  Use of Secondary Sorting and how to solve using MapReduce
v  How to Identify Performance Bottlenecks in MR jobs and tuning MR jobs.
v  Map Reduce Streaming and Pipes with examples
v  Exploring the Apache MapReduce Web UI

Apache PIG

v  Introduction to Apache Pig
v  Map Reduce Vs Apache Pig
v  SQL Vs Apache Pig
v  Different data types in Pig
v  Modes Of Execution in Pig
o    Local Mode
o    Map Reduce Mode
v  Execution Mechanism
o    Grunt Shell
o    Script
o    Embedded
v  UDF's
o    How to write the UDF's in Pig
o    How to use the UDF's in Pig
o    Importance of UDF's in Pig
v  Filter's
o    How to write the Filter's in Pig
o    How to use the Filter's in Pig
o    Importance of Filter's in Pig
v  Load Functions
o    How to write the Load Functions in Pig
o    How to use the Load Functions in Pig
o    Importance of Load Functions in Pig
v  Store Functions
o    How to use the Store Functions in Pig
o    Importance of Store Functions in Pig
v  Transformations in Pig
v  How to write the complex pig scripts
v  How to integrate the Pig and Hbase


Apache HIVE

v  Hive Introduction
v  Hive architecture
o    Driver
o    Compiler
o    Semantic Analyzer
v  Hive Integration with Hadoop
v  Hive Query Language(Hive QL)
v  SQL VS Hive QL
v  Hive Installation and Configuration
v  Hive, Map-Reduce and Local-Mode
v  Hive DLL and DML Operations
v  Hive Services
o    CLI
o    Hiveserver
o    Hwi
v  Metastore
o    embedded metastore configuration        
o    external metastore configuration
v  UDF's
o    How to write the UDF's in Hive
o    How to use the UDF's in Hive
o    Importance of UDF's in Hive
v  UDAF's
o    How to use the UDAF's in Hive
o    Importance of UDAF's in Hive
v  UDTF's
o    How to use the UDTF's in Hive
o    Importance of UDTF's in Hive
v  How to write a complex Hive queries
v  What is Hive Data Model?
v  Partitions
o    Importance of Hive Partitions in production environment
o    Limitations of Hive Partitions
o    How to write Partitions
v  Buckets
o    Importance of Hive Buckets in production environment
o    How to write Buckets
v  SerDe
o    Importance of Hive SerDe's in production environment
o    How to write SerDe programs
v  How to integrate the Hive and Hbase


Apache Zookeeper

v  Introduction to zookeeper
v  Pseudo mode installations
v  Zookeeper cluster installations
v  Basic commands execution

Apache HBase

v  HBase introduction
v  HBase usecases
v  HBase basics
o    Column families
o    Scans
v  HBase installation
o    Local mode
o    Psuedo mode
o    Cluster mode
v  HBase Architecture
o    Storage       
o    WriteAhead Log
o    Log Structured MergeTrees
v  Mapreduce integration
o    Mapreduce over HBase
v  HBase Usage
o    Key design
o    Bloom Filters
o    Versioning
o    Coprocessors
o    Filters
v  HBase Clients
o    REST
o    Thrift
o    Hive
o    Web Based UI
v  HBase Admin
o    Schema definition
o   Basic CRUD operations

Apache SQOOP

v  Introduction to Sqoop
v  MySQL client and Server Installation
v  Sqoop Installation
v  How to connect to Relational Database using Sqoop
v  Sqoop Commands and Examples on Import and Export commands

Apache FLUME

v  Introduction to flume
v  Flume installation
v  Flume agent usage and Flume examples execution

Apache OOZIE

v  Introduction to oozie
v  Oozie installation
v  Executing oozie workflow jobs
v  Monitering Oozie workflow jobs

 

    Apache Mahout

v  Introduction to mahout
v  Mahout installation
v  Mahout examples

Apache Cassandra

v  Introduction to Cassandra
v  Cassandra examples

 

Storm

v  Introduction to Storm
v  Storm examples

MongoDB

v  Introduction to MongoDB
v  MongoDB installation
v  MongoDB examples

Apache Nutch

v  Introduction to Nutch
v  Nutch Installation
v  Nutch Examples

 

Cloudera Distribution

v  Introduction to Cloudera
v  Cloudera Installation
v  Cloudera Certification details
v  How to use cloudera hadoop
v  What are the main differences between Cloudera and Apache hadoop

 

Hortonworks Distribution

v  Introduction to Hortonworks
v  Hortonworks Installation
v  Hortonworks Certification details
v  How to use Hortonworks hadoop
v  What are the main differences between Hortonworks and Apache hadoop

 

Amazon EMR

v  Introduction to Amazon EMR and Amazon EC2
v  How to use Amazon EMR and Amazon EC2
v  Why to use Amazon EMR and Importance of this


Advanced and New technologies architectural discussions

v  Mahout (Machine Learning Algorithms)
v  Storm (Real time data streaming)
v  Cassandra (NOSQL database)
v  MongoDB (NOSQL database)
v  Solr (Search engine)
v  Nutch (Web Crawler)
v  Lucene (Indexing data)
v  Ganglia, Nagios (Monitoring tools)
v  Cloudera, Hortonworks, MapR, Amazon EMR (Distributions)
v  How to crack the Cloudera certification questions

Pre-Requisites for this Course

·         Java Basics like OOPS Concepts, Interfaces, Classes and Abstract Classes etc (Free Java classes as part of course)
·         SQL Basic Knowledge ( Free SQL classes as part of course)
·         Linux Basic Commands (Provided in our blog)

 

Adminstration topics:

·         Hadoop Installations
o    Local mode (hands on installation on ur laptop)
o    Psuedo mode (hands on installation on ur laptop)
o    Cluster mode (hands on 20 node cluster setup in our lab)
o    Nodes Commissioning and De-commissioning in Hadoop Cluster
o    Jobs Monitoring in Hadoop Cluster
o    Fair Scheduler (hands on installation on ur laptop)
o    Capacity Scheduler (hands on installation on ur laptop)
·         Hive Installations
o    Local mode (hands on installation on ur laptop)
§  With internal Derby
o    Cluster mode (hands on installation on ur laptop)
§  With external Derby
§  With external MySql
o    Hive Web Interface (HWI) mode (hands on installation on ur laptop)
o    Hive Thrift Server mode (hands on installation on ur laptop)
o    Derby Installation (hands on installation on ur laptop)
o    MySql Installation (hands on installation on ur laptop)
·         Pig Installations
o    Local mode (hands on installation on ur laptop)
o    Mapreduce mode (hands on installation on ur laptop)
·         Hbase Installations
o    Local mode (hands on installation on ur laptop)
o    Psuedo mode (hands on installation on ur laptop)
o    Cluster mode (hands on installation on ur laptop)
§  With internal Zookeeper
§  With external Zookeeper
·         Zookeeper Installations
o    Local mode (hands on installation on ur laptop)
o    Cluster mode (hands on installation on ur laptop)
·         Sqoop Installations
o    Sqoop installation with MySql (hands on installation on ur laptop)
o    Sqoop with hadoop integration (hands on installation on ur laptop)
o    Sqoop with hive integration (hands on installation on ur laptop)
·         Flume Installation
o    Psuedo mode (hands on installation on ur laptop)
·         Oozie Installation
o    Psuedo mode (hands on installation on ur laptop)
·         Mahout Installation
o    Local mode (hands on installation on ur laptop)
o    Psuedo mode (hands on installation on ur laptop)
·         MongoDB Installation
o    Psuedo mode (hands on installation on ur laptop)
·         Nutch Installation
o    Psuedo mode (hands on installation on ur laptop)
·         Cloudera Hadoop Distribution installation
o    Hadoop
o    Hive
o    Pig
o    Hbase
o    Hue
·         HortonWorks Hadoop Distribution installation
o    Hadoop
o    Hive
o    Pig
o    Hbase
o    Hue

Hadoop ecosystem Integrations:

o    Hadoop and Hive Integration
o    Hadoop and Pig Integration
o    Hadoop and HBase Integration
o    Hadoop and Sqoop Integration
o    Hadoop and Oozie Integration
o    Hadoop and Flume Integration
o    Hive and Pig Integration
o    Hive and HBase integration
o    Pig and HBase integration
o    Sqoop and RDBMS Integration
o    Mahout and Hadoop Integration

What we are offering to you:

·         Hands on MapReduce programming around 20+ programs these will make you to pefect in MapReduce both concept-wise and programatically
·         Hands on 5 POC's will be provided (These POC's will help you perfect in Hadoop and it's ecosystems)
·         Hands on 20 Node cluster setup in our Lab.
·         Hands on installation for all the Hadoop and ecosystems in your laptop.
·         Well documented Hadoop material with all the topics covering in the course
·         Well documented Hadoop blog contains frequent interview questions along with the answers and latest updates on BigData technology.
·         Real time projects explanation will be provided.
·         Mock Interviews will be conducted on one-to-one basis.
·         Discussing about hadoop interview questions daily base.
·    Resume preparation with POC's or Project's based on your experiance.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...