Big Data Hadoop

Big Data Hadoop (Hadoop)

Due to the advent of new technologies, devices, and communication means, the amount of data produced by mankind is growing rapidly every year. Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. Challenges include analysis, capture, data curation, search, storage, transfer, visualization, querying, updating and information privacy.

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. Hadoop uses a distributed computing architecture consisting of multiple servers using commodity hardware, making it relatively inexpensive to scale and support extremely large data stores.

Big data is really critical to our life and its emerging as one of the most important technologies in modern world. Big data technologies are important in providing more accurate analysis, which may lead to more concrete decision-making resulting in greater operational efficiencies, cost reductions, and reduced risks for the business.

This course provides the knowledge to use new Big Data tools and learn ways of storing information that will allow for efficient processing and analysis for informed business decision-making. Further, you learn to store, manage, process and analyze massive amounts of unstructured data.

Category

Big Data
Duration

60 Hours
Level

Beginner

What Will I Learn ?

YOU WILL LEARN HOW TO:

Unleash the power of Big Data for competitive advantage
Select and apply the correct Big Data stores for disparate data sets
Leverage Hadoop to process large data sets to benefit decisions
Query large data sets in near real-time
Integrate components of the Hadoop EcoSystem to form a coherent business solution

HANDS-ON EXPERIENCE:

You are provided with an in-class computer dedicated for your sole use.
Integrating key Big Data components to create a Big Data platform
Loading unstructured data into Hadoop Distributed File System
Querying Hadoop MapReduce jobs using Hive
Simplifying Big Data processing and communicating with Pig Latin
Extracting value in real-time with Impala
Implementing a targeted Big Data strategy
Use of AdaptaLearn “Exercises, PLUS access to Computing Sandbox”

Prerequisite Knowledge

Knowledge about Big Data concepts

RELATED COURSES

Hadoop Development for Big Data Solutions
Hadoop Architecture & Administration for Big Data Solutions
Introduction to Data Science for Big Data Analytics
Extracting Business Value from Big Data with Pig, Hive & Impala
Agile Business Analysis
Amazon Web Services (AWS)
Essentials of Cloud Security Management
Java Programming Introduction
C# Programming
Python Programming Introduction
Cloud Computing Technologies Introduction

Who can benefits ?

Anyone, including managers, programmers, architects and administrators, who wants a foundational overview of the key components of Big Data and how they can be integrated to provide suitable solutions for their organization. No programming experience is required. Programmers should be aware that the exercises in this course are intended to give attendees high-level exposure to the capabilities of the Big Data technologies, and not a deep dive.

Opportunity Scope

Mentor shall discuss on classroom.

Modules / Chapter

Module 1: Introductory class [2 hrs]

Challenges for processing big data?
Technologies support big data?
What is Hadoop?
Why Hadoop?
History of Hadoop
Use cases of Hadoop
RDBMS vs Hadoop
When to use and when not to use Hadoop
Ecosystem tour
Vendor comparison
Hardware Recommendations & Statistics

Module 2: HDFS: Hadoop Distributed File System [6 hrs]

Significance of HDFS in Hadoop
Features of HDFS
5 daemons of Hadoop

Name Node and its functionality
Data Node and its functionality
Secondary Name Node and its functionality
Job Tracker and its functionality
Task Tracker and its functionality

Data Storage in HDFS

Introduction about Blocks
Data replication

Accessing HDFS

CLI (Command Line Interface) and admin commands
Java Based Approach

Fault tolerance
Download Hadoop
Installation and set-up of Hadoop

Start-up & Shut down process

HDFS Federation

Module 3: Map Reduce [6 hrs]

Introduction to MapReduce
Architecture of MapReduce
Architectural Flow of MapReducing Processing
Concepts of Blocks & Input Splits- Relationship
Data Locality Optimisation
Programming with MapReduce
MapReduce Program LifeCycle
Combiners & Partitioners
Side Data Distribution
Map and Reduce Side joins
Counters
MapReduce Realtime Programming

Module 4: YARN [4 hrs]

Challenges of Hadoop First Generation
Necessity of Hadoop Upgradation
Hadoop 2.0 Architecture
NameNode High Availability
HDFS Federation
Failover and Fencing Mechanism
MapReduce-2
Core Concepts of YARN
Upgrading Existing MRv1 to MRv2

Module 5: HIVE [10 hrs]

Introduction to Hive
Hive Architecture
Configuring Hive on Hadoop Cluster
Running Hive and Executing Hive Queries
Concepts of Hive Executing Engine
Comparison with Traditional Databases
Hive Query Language: Datatypes, Operators and Functions
Types of Hive Tables: [ Managed, External, Temporary ]
Optimisation Techniques in Hive
Partitioning Concepts & Types of Partitioning
Bucketing Concepts
Realtime Use Case Using Hive

Module 6: PIG [8 hrs]

Introduction to Apache Pig
Map Reduce Vs. Apache Pig
SQL vs. Apache Pig
Different data types in Pig
Modes of Execution in Pig
Grunt shell
Loading data
Exploring Pig
Latin command

Module 7: HBASE [8 hrs]

Architecture and schema design
HBase vs. RDBMS
HMaster and Region Servers
Column Families and Regions
Write pipeline
Read pipeline
HBase commands

Module 8: Sqoop [8 hrs]

Introduction to Sqoop
Sqoop Architecture
Downloading and Configuring Sqoop
Exposing Sqoop Tools
30 Basic Sqoop Import Export Cases
Incremental Import
Importing Data by joining multiple tables
Using Custom Boundaries Queries in Sqoop
Types of Export
Use cases of Export
Importing Data Directly into HIVE and HBASE
Sqoop Integration with Hadoop Eco System Components
Query Scheduling and Automation

Module 9: Flume [8 hrs]

Introduction of FLUME
FLUME components(SOURCE, CHANNELS, SINKS,
CHANNEL SELECTOR, AGENT,EVENT, SINK
PROCESSORS)
Flume Architecture
Downloading and configuring Flume
Topology Design Considerations
Flume API Concepts
Ingesting Data into HDFS from different Sources

Module 10: Single Node Configuration [4 hrs]

Introduction to Different Modes of Hadoop Configuration
Choosing O.S for the Hadoop Cluster
SSH Concepts & Configurations
Installing Java
Creating Hadoop User
Understanding Hadoop Configuration Files
Setting Up The Single Node Cluster

Module 11: Multi Node Configuration [2 hrs]

Choosing the Hadoop Cluster Hardware
Choosing Hadoop Distribution
Cluster Planning
Setting Up MultiNode Cluster
Hadoop Security

Enquiry Form

Required fields are marked (*).

(Max 350 words only)

Contact Information

Address

Anamnagar - 32 Kathmandu, Nepal
Email

info@labanepal.com
Phone

+977-1-4102721, 4102722, 4244804
Opening Hours

10 AM - 5 PM

Registration Form

Required fields are marked (*).

(Max 350 words only)

Contact Information

Address

Anamnagar - 32 Kathmandu, Nepal
Email

info@labanepal.com
Phone

+977-1-4102721, 4102722, 4244804
Opening Hours

10 AM - 5 PM

Big Data Hadoop

Big Data Hadoop (Hadoop)

Category

Big Data

Duration

60 Hours

Level

Beginner

What Will I Learn ?

YOU WILL LEARN HOW TO:

HANDS-ON EXPERIENCE:

Prerequisite Knowledge

RELATED COURSES

Who can benefits ?

Opportunity Scope

Modules / Chapter

Module 1: Introductory class [2 hrs]

Module 2: HDFS: Hadoop Distributed File System [6 hrs]

Module 3: Map Reduce [6 hrs]

Module 4: YARN [4 hrs]

Module 5: HIVE [10 hrs]

Module 6: PIG [8 hrs]

Module 7: HBASE [8 hrs]

Module 8: Sqoop [8 hrs]

Module 9: Flume [8 hrs]

Module 10: Single Node Configuration [4 hrs]

Module 11: Multi Node Configuration [2 hrs]

Enquiry Form

Contact Information

Address

Email

Phone

Opening Hours

Registration Form

Contact Information

Address

Email

Phone

Opening Hours

Related Courses

Big Data Hadoop

Sign Up for News and Offers

LABA Pvt. Ltd.

Featured Courses

Certification

Company