Big Data Hadoop (Hadoop)

Due to the advent of new technologies, devices, and communication means, the amount of data produced by mankind is growing rapidly every year. Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. Challenges include analysis, capture, data curation, search, storage, transfer, visualization, querying, updating and information privacy.

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. Hadoop uses a distributed computing architecture consisting of multiple servers using commodity hardware, making it relatively inexpensive to scale and support extremely large data stores.

Big data is really critical to our life and its emerging as one of the most important technologies in modern world. Big data technologies are important in providing more accurate analysis, which may lead to more concrete decision-making resulting in greater operational efficiencies, cost reductions, and reduced risks for the business.

This course provides the knowledge to use new Big Data tools and learn ways of storing information that will allow for efficient processing and analysis for informed business decision-making. Further, you learn to store, manage, process and analyze massive amounts of unstructured data.

  • Category
    Big Data
  • Duration
    60 Hours
  • Level
    Beginner

What Will I Learn ?

YOU WILL LEARN HOW TO: 
  • Unleash the power of Big Data for competitive advantage 
  • Select and apply the correct Big Data stores for disparate data sets 
  • Leverage Hadoop to process large data sets to benefit decisions 
  • Query large data sets in near real-time 
  • Integrate components of the Hadoop EcoSystem to form a coherent business solution 
HANDS-ON EXPERIENCE:
  • You are provided with an in-class computer dedicated for your sole use. 
  • Integrating key Big Data components to create a Big Data platform 
  • Loading unstructured data into Hadoop Distributed File System 
  • Querying Hadoop MapReduce jobs using Hive 
  • Simplifying Big Data processing and communicating with Pig Latin 
  • Extracting value in real-time with Impala 
  • Implementing a targeted Big Data strategy 
  • Use of AdaptaLearn “Exercises, PLUS access to Computing Sandbox” 

Prerequisite Knowledge

  • Knowledge about Big Data concepts
RELATED COURSES
  • Hadoop Development for Big Data Solutions 
  • Hadoop Architecture & Administration for Big Data Solutions 
  • Introduction to Data Science for Big Data Analytics 
  • Extracting Business Value from Big Data with Pig, Hive & Impala 
  • Agile Business Analysis 
  • Amazon Web Services (AWS) 
  • Essentials of Cloud Security Management 
  • Java Programming Introduction 
  • C# Programming 
  • Python Programming Introduction 
  • Cloud Computing Technologies Introduction

Who can benefits ?

Anyone, including managers, programmers, architects and administrators, who wants a foundational overview of the key components of Big Data and how they can be integrated to provide suitable solutions for their organization. No programming experience is required. Programmers should be aware that the exercises in this course are intended to give attendees high-level exposure to the capabilities of the Big Data technologies, and not a deep dive. 


Opportunity Scope

Mentor shall discuss on classroom.

Modules / Chapter

Module 1: Introductory class [2 hrs]

  • Challenges for processing big data?
  • Technologies support big data?
  • What is Hadoop?
  • Why Hadoop?
  • History of Hadoop
  • Use cases of Hadoop
  • RDBMS vs Hadoop
  • When to use and when not to use Hadoop
  • Ecosystem tour
  • Vendor comparison
  • Hardware Recommendations & Statistics

Module 2: HDFS: Hadoop Distributed File System [6 hrs]

  • Significance of HDFS in Hadoop
  • Features of HDFS
  • 5 daemons of Hadoop
    • Name Node and its functionality
    • Data Node and its functionality
    • Secondary Name Node and its functionality
    • Job Tracker and its functionality
    • Task Tracker and its functionality
  • Data Storage in HDFS
    • Introduction about Blocks
    • Data replication
  • Accessing HDFS
    • CLI (Command Line Interface) and admin commands
    • Java Based Approach
  • Fault tolerance
  • Download Hadoop
  • Installation and set-up of Hadoop
    • Start-up & Shut down process
  • HDFS Federation

Module 3: Map Reduce [6 hrs]

  • Introduction to MapReduce
  • Architecture of MapReduce
  • Architectural Flow of MapReducing Processing
  • Concepts of Blocks & Input Splits- Relationship
  • Data Locality Optimisation
  • Programming with MapReduce
  • MapReduce Program LifeCycle
  • Combiners & Partitioners
  • Side Data Distribution
  • Map and Reduce Side joins
  • Counters
  • MapReduce Realtime Programming

Module 4: YARN [4 hrs]

  • Challenges of Hadoop First Generation
  • Necessity of Hadoop Upgradation
  • Hadoop 2.0 Architecture
  • NameNode High Availability
  • HDFS Federation
  • Failover and Fencing Mechanism
  • MapReduce-2
  • Core Concepts of YARN
  • Upgrading Existing MRv1 to MRv2

Module 5: HIVE [10 hrs]

  • Introduction to Hive
  • Hive Architecture
  • Configuring Hive on Hadoop Cluster
  • Running Hive and Executing Hive Queries
  • Concepts of Hive Executing Engine
  • Comparison with Traditional Databases
  • Hive Query Language: Datatypes, Operators and Functions
  • Types of Hive Tables: [ Managed,  External, Temporary ]
  • Optimisation Techniques in Hive
  • Partitioning Concepts & Types of Partitioning
  • Bucketing Concepts
  • Realtime Use Case Using Hive

Module 6: PIG [8 hrs]

  • Introduction to Apache Pig
  • Map Reduce Vs. Apache Pig
  • SQL vs. Apache Pig
  • Different data types in Pig
  • Modes of Execution in Pig
  • Grunt shell
  • Loading data
  • Exploring Pig
  • Latin command

Module 7: HBASE [8 hrs]

  • Architecture and schema design
  • HBase vs. RDBMS
  • HMaster and Region Servers
  • Column Families and Regions
  • Write pipeline
  • Read pipeline
  • HBase commands

Module 8: Sqoop [8 hrs]

  • Introduction to Sqoop
  • Sqoop Architecture
  • Downloading and Configuring Sqoop
  • Exposing Sqoop Tools
  • 30 Basic Sqoop Import Export Cases
  • Incremental Import
  • Importing Data by joining multiple tables
  • Using Custom Boundaries Queries in Sqoop
  • Types of Export
  • Use cases of Export
  • Importing Data Directly into HIVE and HBASE
  • Sqoop Integration with Hadoop Eco System Components
  • Query Scheduling and Automation

Module 9: Flume [8 hrs]

  • Introduction of FLUME
  • FLUME components(SOURCE, CHANNELS, SINKS,
  • CHANNEL SELECTOR, AGENT,EVENT, SINK
  • PROCESSORS)
  • Flume Architecture
  • Downloading and configuring Flume
  • Topology Design Considerations
  • Flume API Concepts
  • Ingesting Data into HDFS from different Sources

Module 10: Single Node Configuration [4 hrs]

  • Introduction to Different Modes of Hadoop Configuration
  • Choosing O.S for the Hadoop Cluster
  • SSH Concepts & Configurations
  • Installing Java
  • Creating Hadoop User
  • Understanding Hadoop Configuration Files
  • Setting Up The Single Node Cluster

Module 11: Multi Node Configuration [2 hrs] 

  • Choosing the Hadoop Cluster Hardware
  • Choosing Hadoop Distribution
  • Cluster Planning
  • Setting Up MultiNode Cluster
  • Hadoop Security

Enquiry Form

Required fields are marked (*).

(Max 350 words only)

Contact Information

  • Address

    Anamnagar - 32 Kathmandu, Nepal

  • Email

    info@labanepal.com

  • Phone

    +977-1-4102721, 4102722, 4244804

  • Opening Hours

    10 AM - 5 PM

Registration Form

Required fields are marked (*).

(Max 350 words only)

Contact Information

  • Address

    Anamnagar - 32 Kathmandu, Nepal

  • Email

    info@labanepal.com

  • Phone

    +977-1-4102721, 4102722, 4244804

  • Opening Hours

    10 AM - 5 PM

newsletter

Sign Up for News and Offers

Subscribe for the latest news and great deals we offer