Hadoop Online Training

  • Understanding BigData
    • What is Big Data?
    • Big-Data characteristics
  • Hadoop Distributions
    • Hortonworks
    • Cloudera
    • Pivotal HD
    • Greenplum
  • Introduction to Apache Hadoop
    • Flavors of Hadoop: Big-Insights, Google Query etc..
  • Hadoop Eco-system components: Introduction
    • MapReduce
    • HDFS
    • Apache Pig
    • Apache Hive
    • HBASE
    • Apache Oozie
    • FLUME
    • SQOOP
    • Apache Mahout
    • KIJI
    • LUCENE
    • SOLR
    • KiteSDK
    • Impala
    • Chukwa
    • Shark
    • Cascading
  • Understanding Hadoop Cluster
  • Hadoop Core-Components
    • NameNode
    • JobTracker
    • TaskTracker
    • DataNode
    • SecondaryNameNode
  • HDFS Architecture
    • Why 64MB?
    • Why Block?
    • Why replication factor 3?
  • Discuss NameNode and DataNode
  • Discuss JobTracker and TaskTracker
  • Typical workflow of Hadoop application
  • Rack Awareness
    • Network Topology
    • Assignment of Blocks to Racks and Nodes
    • Block Reports
    • Heart Beat
    • Block Management Service
  • Anatomy of File Write
  • Anatomy of File Read
  • Heart Beats and Block Reports
    • Discuss Secondary NameNode
    • Usage of FsImage and Edits log
      • Map Reduce Overview
      • Best Practices to setup Hadoop cluster
      • Cluster Configuration
        • Core-default.xml
        • Hdfs-default.xml
        • Mapred-default.xml
        • Hadoop-env.sh
        • Slaves
        • Masters
      • Need of *-site.xml
      • Map Reduce Framework
      • Why Map Reduce?
      • Use cases where Map Reduce is used
      • Hello world program with Weather Use Case
        • Setup environment for the programs
        • Possible ways of writing Map Reduce program with sample codes find the best code and discuss
        • Configured, Tool, GenericOptionParser and queues usage
        • Demo for calculating maximum temperature and Minimum temperature
      • Limitations of traditional way of solving word count with large dataset
      • Map Reduce way of solving the problem
      • Complete overview of MapReduce
      • Split Size
      • Combiners
      • Multi Reducers
      • Parts of Map Reduce
      • Algorithms
      • Apache Hadoop Single Node Installation Demo
      • Namenode format
      • Apache Hadoop Multi Node Installation Demo
      • Add nodes dynamically to a cluster with Demo
      • Remove nodes dynamically to a cluster with Demo
      • Safe Mode
      • Hadoop cluster modes
        • Standalone Mode
        • Psuedo distributed Mode
      • Fully distributed mode
      • Revision
      • HDFS Practicals(HDFS Commands)
      • Map Reduce Anatomy
        • Job Submission
        • Job Initialization
        • Task Assignments
        • Task Execution
      • Schedulers
      • Quiz
      • Map Reduce Failure Scenarios
      • Speculative Execution
      • Sequence File
      • Input File Formats
      • Output File Formats
      • Writable DataTypes
      • Custom Input Formats
      • Custom keys, Values usage of writables
      • Walkthrough the installation process through the cloudera manager
      • Example List, show sample example list for the installation
      • Demo on teragen, wordcount, inverted index, examples
      • Debugging Map Reduce Programs
      • Map Reduce Advance Concepts
      • Partitioning and Custom Partitioner
      • Joins
      • Multi outputs
      • Counters
      • MR unit testcases
      • MR Design patterns
      • Distributed Cache
        • Command line implementation
      • MapReduce API implementation
      • Map Reduce Advance concepts examples
      • Introduction to course Project
      • Data loading techniques
        • Hadoop Copy commands
          • Put,get,copyFromLocal,copyToLocal,mv,chmod,rmr,rmr –skipTrash,distcp,ls,lsr,df,du,cp,moveFromLocal,moveToLocal,text,touhz,tail,mkdir,help
        • Flume
        • Sqoop
      • Demo for Hadoop Copy Commands
      • Sqoop Theory
      • Demo for Sqoop
      • Need of Pig?
      • Why Pig Created?
      • Introduction to skew Join
      • Why go for Pig when Map Reduce is there?
      • Pig use cases
      • Pig built in operators
      • Pig store schem
      • Operators
        • Load
        • Store
        • Dump
        • Filter
        • Distinct
        • Group
        • CoGroup
        • Join
        • Stream
        • Foreach Generate
        • Parallel
        • Distinct
        • Limit
        • ORDER
        • CROSS
        • UNION
        • SPLIT
        • Sampling
      • Dump Vs Store
      • DataTypes
        • Complex
          • Bag
          • Tuple
          • Atom
          • Map
        • Primitives
          • Integers
          • Float
          • Chararray
          • byteArray
          • Double
      • Diagnostic Operators
        • Describe
        • Explain
        • Illustrate
      • UDFs
        • Filter Function
        • Eval Function
        • Macros
        • Demo
      • Storage Handlers
      • Pig Practicals and Usecases
      • Demo using schema
      • Demo using without schema
      • Hive Background
      • What is Hive?
      • Pig Vs Hive
      • Where to Use Hive?
      • Hive Architecture
      • Metastore
      • Hive execution modes
      • External, Manged, Native and Non-native tables
      • Hive Partitions
        • Dynamic Partitions
        • Static Partitions
      • Buckets
      • Hive DataModel
      • Hive DataTypes
        • Primitive
        • Complex
      • Queries
        • Create Managed Table
        • Load Data
        • Insert overwrite table
        • Insert into Local directory
        • CTAS
        • Insert Overwrite table select
      • Joins
        • Inner Joins
        • Outer Joins
        • Skew Joins
      • Multi-table Inserts
      • Multiple files, directories, table inserts
      • Serde
      • View
      • Index
      • UDF
      • UDAF
      • Hive Practicals
      • Oozie Architecture
      • Workflow designing in Oozie
      • Oozie practicals
      • YARN Architecture
      • Hadoop Classic vs YARN
      • YARN Demo
      • Flume Architecture
      • Flume Practicals
      • Zoo Keeper
      • Introduction to NOSQL Databases
      • NOSql Landscapes
      • Introduction to HBASE
      • HBASE vs RDBMS
      • Create Table on HBASE using HBASE shell
      • Where to use HBASE?
      • Where not to use HBASE?
      • Write Files to HBASE
      • Major Components of HBASE
        • HBase Master
        • HRegionServer
        • HBase Client
        • Zookeeper
        • Region
      • HBase Practicals
      • HBASE –ROOT- Catalog table
      • CAP Theorm
      • Compaction
      • Sharding
      • Sparse Datastore
      • Cassandra Architecture
      • Big Table and Dynamo
      • Distributed Hash Table, P2P Fault Tolerant
      • Data Modelling
      • Column Families
      • Installation Demo on Cassandra
      • Practicals
      • Real time Project Analysis
      • Design
      • Implementation
      • Execution
      • Debugging
      • Optimization Techniques
      • Which one to use where
      • Amazon Web Services(Hadoop on Cloud) – Installations for MultiNode
      • EMR and S3
      • Storm Architecture
      • Real time use case with Storm
      • Spark
        • What is Spark?
        • Understanding Spark
        • Spark Architecture
        • RDD
        • Hadoop RDD
        • RDDs Partitioning
        • Lazy Evaluation
        • Caching
        • Spark Context
        • Map, flatMap, filter
        • Actions
        • Serialization
        • Scala
        • Scala Features
        • Scala Functions
        • Collections and Combiners
        • Spark with Scala
        • Spark with Yarn
        • Spark on Cluster mode
        • Spark CLI
        • Spark programming with Java API
        • Spark Streaming
        • Spark SQL
        • Spark SQL Context
        • Spark SQL with Hive
        • Spark MLib Algorithms(K-Means, Clustering,..)
        • Spark GraphX Overview
        • Hands On and Usecases
      • Impala Architecture
      • Impala Practicals
      • Adhoc Querying in Impala
      • Compression Techniques
        • Snappy
        • LZO
        • Bgzip
      • Image processing in Hadoop
      • Certification Preparation Guidelines
      • Best Practices to setup Hadoop cluster
      • Commissioning and Decommissioning Nodes
      • Benchmarking the Hadoop cluster
      • Admin monitoring tools
      • Routine Admin tasks
      • Kafka Architecture
      • Kafka Usecase Execution