MAIN MENU

Main Menu

Institute / Trainer Account

Social Links

img

Big Data Training
By
MGTechnosoft

Course Info

Course Description:

 

Topic 1 : Understanding Hadoop

  • What is Big Data?
  • The Three V’s of Big Data
  • 6 Key Hadoop Data Types
  • What is Hadoop?
  • What is Hadoop 2.0?
  • Relational Databases vs. Hadoop
  • The Hadoop Ecosystem

Topic 2: Hadoop Installation

  • Hadoop Installation on Local Machin
  • Hadoop Configurations

Topic 3: HDFS (Hadoop Distributed File System)

  • What is HDFS?
  • HDFS Components
  • Understanding Block Storage
  • The NameNode
  • The DataNode
  • DataNode Failure
  • What is Federated NameNodes
  • Multiple Namespaces
  • Overview of HDFS High Availability
  • Quorum Journal Manager
  • Configuring Automatic Failover
  • HDFS commands
  • HDFS File Permissions

Topic 4 : Inputting Data Into HDFS

  • Options for Data Input
  • The Hadoop Client
  • WebHDFS
  • Overview of Flume
  • A flume Example
  • Overview of Sqoop
  • The Sqoop Import Tool
  • Importing a Table
  • Importing Specific Columns
  • Importing from a Query
  • The Sqoop Export Tool
  • Exporting to a Table
  • Importing RDBMS Data into HDFS
  • Exporting HDFS Data to a RDBMS

Topic 5 : MapReduce Framework and YARN

  • Overview of MapReduce
  • WordCount in MapReduce
  • Understanding MapReduce
  • The Map Phase
  • The Reduce Phase
  • The Key/Value Pairs of MapReduce
  • What is YARN?
  • The Components of YARN
  • Lifecycle of a YARN Application
  • Running a MapReduce Job
  • Running a YARN Application

Topic 6: Introduction to PIG

  • What is PIG?
  • PIG Latin
  • The Grant Shell
  • Understanding Pig
  • Pig Data Types
  • Pig Complex Types
  • Defining a Schema
  • The GROUP Operator
  • GROUP ALL
  • Relations without a schema
  • The FOREACH GENERATE Operator
  • Specifying Ranges in FOREACH
  • FOREACH with Groups
  • The FILTER Operator
  • The LIMIT Operator

Topic 7 : Advance PIG Programming

  • The ORDER BY Operator
  • Binary Condition Operator
  • Parameter Substitution
  • The DISTINCT Operator
  • Using PARALLEL
  • The FLATTEN Operator
  • Nested FOREACH
  • Performing an Inner Join
  • Performing an Outer Join
  • Replicated Joins
  • The COGROUP Operator
  • PIG UDF
  • A UDF Example
  • Invoking a UDF

Topic 8 : Hive Programming

  • What is HIVE?
  • Comparing HIVE to SQL
  • Hive Architecture
  • Submitting Hive Queries
  • Defining a Hive- Managed Table
  • Defining an External Table
  • Defining  a Table Location
  • Loading Data into a Hive Table
  • Performing Queries
  • Hive Partitions
  • Hive Buckets
  • Sorting data
  • Using Distributed By
  • Sorting Results to a File
  • Hive Joins
  • Shuffle Joins
  • Map Joins
  • Sort-Merge-Bucket Joins
  • HIVE UDF
  • Invoking a Hive UDF

Topic 9: Advance Hive Programming

  • Performing a Multi Table/File Insert
  • Understanding Views
  • Defining Views
  • Using Views
  • Overview of Indexes
  • Defining Indexes
  • The Over Clause
  • Hive File Formats
  • Hive ORC Files
  • Using HiveServer2
  • Understanding Hive on Tez
  • Using Tez for Hive Queries
  • Hive Optimization Tips

Topic 10 : Defining Workflow with Oozie

  • Overview of Oozie
  • Defining an Oozie Workflow
  • Pig Action
  • Hive Action
  • MapReduce Actions
  • Submitting a Workflow Job
  • Making Decisions
  • Defining an Oozie Coordinator Job

Live 2 Projects - Covering all Hadoop Ecosystem
Interview Questions

Overview of Spark/Hbase  

Topics covered:

Hadoop Big Data

Institute Info

Faculty : ------
Duration : 60 Days
Course Fee : 22,000
Training Type : Classroom

Related Courses

Register Now

SEND COURSE ENQUIRY