Architecture HDFS File System

Introduction to HDFS and HDFS architecture: Hadoop mainly consists of two components: a distributed file system known as the Hadoop ...


Introduction to HDFS and HDFS architecture:

  • Hadoop mainly consists of two components: a distributed file system known as the Hadoop distributed file system(HDFS) and a distributed processing framework named “MapReduce” (which is now supported by a component called “YARN (Resource Manager + task scheduler”)
  • The HDFS is the storage component. Which provides a distributed architecture for exceeding large scale storage, which can easily be extended by scaling out.
HDFS has a master/slave architecture:
  • 3 Master Demons
  1. Name Node: This daemon stores and maintains the metadata for HDFS.
  2. Secondary Name Node: Performs housekeeping functions for the Name Node.
  3. Job Tracker: Manages Map Reduce jobs, distributes individual tasks to machines running the Task Tracker.
  • 2 Slave Demons
  1. Data Node: Stores actual HDFS data blocks.
  2. Task Tracker: Responsible for instantiating and monitoring individual Map and Reduce tasks.
HDFS has a master/slave architecture
Data Storage in HDFS:
  • Data storage in HDFS is different from the way files are stored in Windows or Linux.
  • In HDFS, the system breaks down the file meant for storage into a set of Individual blocks and later stores the blocks in various slave nodes of the Hadoop cluster.
  • Minimum size of block is 64MB and Maximum size can be integral (excluding zero and negative numbers) multiple of 64MB
Example: A file (Sample.log) being divided into blocks of data (a,b,c,d,e,f)

 

 https://www.cs.rutgers.edu/%7Epxk/417/notes/images/dfs-1-hdfs-chunks.png
Replicating data blocks :

  • HDFS was designed to store the data even in inexpensive commodity hardware. The hardware can be sometimes unreliable. So to overcome any hardware failures, HDFS replicates data blocks that are stored in the system.
  • Note:  The norm is three copies of individual data blocks and any order can     be stored in the data node. However, no two blocks of same type are stored in the data node (i.e. a1, a2 in the same data node 1).
From the above example the file (Sample.log) being divided into blocks of data (a,b,c,d,e,f). Which in turn are replicated in to three (a1,b1,c1,d1,e1,f1). (a2,b2,c2,d2,e2,f2) and (a3,b3,c3,d3,e3,f3)
DataNode 1
DataNode 2
DataNode 3
                                                                                                             





Name Node Memory concerns :
Secondary Name Node :
Anatomy of File Read:

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3-X86CezdsO5l97gevthcyLt3CtB41n6uZh98A1qmm8U_GSiIjyORZPeTDvqmtg2GllqJhreNTRbO-ntZNIsbK2Z5nLkXwkcGqV8HxqreS0JdIYszY9noQUC_zqdt0oh3g_PFD24q8gTa/s640/HDFS_Client_Read_File.png
 Anatomy of File Write:

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmwbwHauqPkPdZZS7VESQcnx_AIwrNiSLl_qcokZN1btlVeBhsm9ChpdvfOLkbq1M7FpWjcSRFJzQ7t9PmGa3rBrjUMb1S0djBc5p3FNu8VwK3Yn-uhNKH9CfqrtvoAfyr39odA6JCLl2j/s640/HDFS_Client_Write_File.png
HDFS Commands:
The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports.
The fs shell can be invoked by $ bin/hdfs dfs <args>
Most of the commands in FS shell behave like corresponding Unix commands.
$cat  Syntax: hdfs dfs -cat  <<File path>>
Usage : Lists contents of the file.
$chgrp     Syntax: hdfs dfs –chgrp <<group name>>  <<file path>>
Usage : Lists contents of the file.
$chmod       Syntax: hdfs dfs  –chmod <<octal number>>  <<file path>>
Usage: Change the permissions of files. With -R, make the change recursively through the directory structure. The user must be the owner of the file, or else a super-user.
$chown    Syntax: hdfs dfs –chown <<Owner name>> <<file path>>
Usage: Change the owner of files. With -R, make the change recursively through the directory structure. The user must be a super-user.
$copyFromLocal Syntax: hdfs dfs  –copyFromLocal
<<local input directory path>> << output hdfs directory path >>
Note: one can use put instead of copyFromLocal
Usage: Copies file from the local source to a output hdfs directory.
.$ copyToLocal Syntax: hdfs dfs  –copyToLocal
<< input hdfs directory path >>  <<local input directory path>>
Note: one can use get instead of copyToLocal
Usage: Copies file from the local hdfs directory to a output local directory.
$count  Syntax: hdfs dfs –count <<directory path>>
Usage: Count the number of directories, files and bytes under the paths that match the specified file pattern.
$cp  Syntax: hdfs dfs –cp <<source hdfs path>> <<output hdfs path>>
Usage: Copy files from source hdfs path to destination hdfs path.
$du Syntax: hdfs dfs -du <<directory path>>
Usage: Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.
$mkdir Syntax:  hdfs dfs -mkdir <<directory name>>
Usage: Creates a directory
$ ls Syntax: hdfs dfs -ls  <<directory or file>>
Usage:For a file or directory returns the statistics.
moveFromLocal Syntax: dfs -moveFromLocal <<local file path>>  <<output hdfs path>>
Similar to put command, except that the source file is deleted after it's copied.
moveToLocal Syntax: hdfs dfs -moveToLocal <<input hdfs path>> <<output local file path>>
Displays a "Not implemented yet" message.
$mv Syntax: hdfs dfs -mv <<input file path>> <<output file path>>
Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.
$rm Syntax: hdfs dfs -rm <<File path>>
Delete the specified file
$setrep Syntax: hdfs dfs -setrep [-R] <<File path>>
Usage: Changes the replication factor of a file. -R option is for recursively increasing the replication factor of files within a directory.
$fsck Syntax: hdfs fsck –blocks <<file path>>
Usage: fsck – file system check
HDFS administrative commands 
Hadoop in safe mode means :
  • Name Node will be in safe mode and HDFS will be in read only mode
There are four important HDFS administrative commands
  1. $hdfs dfsadmin –safemode get
Usage: Gives the status of hdfs system. whether Hadoop is operating in safemode or not (i.e. either on or off state).
2) $hdfs dfsadmin –safemode enter
Usage: Hadoop operates in safemode (i.e on state)
3) $hdfs dfsadmin –safemode leave
Usage: Hadoop leaves safemode (i.e off state)
4) $hdfs dfsadmin –report
Usage: Gives report of the Hadoop system





COMMENTS

Name

Android,9,Big Data,3,Cloud Apps,1,Database,1,Featured,12,Free Downloads,1,HowTo,1,IOT,7,Java,1,Linux,1,News,6,Products,1,SEO,1,TipsnTricks,5,Ubuntu,1,Unity3D,4,Windows Apps,1,
ltr
item
Tech Geek Bytes - A Blog For News, Tutorials, Free Source Code, Tips and Tricks: Architecture HDFS File System
Architecture HDFS File System
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUoCN-bd2JD2AvUtVEUhjlgN1IpTcqaWZ6rIl1DrudRH2YAWXDdJUs5sVwBxjdujdMm5Ip_xTEL6xjnIlHDCagLOFNQ0_r3eN7BBm70WfvyV-PjLHl9zKG2gJWxXv8DiXKWvOyoot5frw/s640/hadoop-hdfs-post-1.jpg
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUoCN-bd2JD2AvUtVEUhjlgN1IpTcqaWZ6rIl1DrudRH2YAWXDdJUs5sVwBxjdujdMm5Ip_xTEL6xjnIlHDCagLOFNQ0_r3eN7BBm70WfvyV-PjLHl9zKG2gJWxXv8DiXKWvOyoot5frw/s72-c/hadoop-hdfs-post-1.jpg
Tech Geek Bytes - A Blog For News, Tutorials, Free Source Code, Tips and Tricks
https://techgeekbytes.blogspot.com/2017/06/architecture-hdfs-file-system.html
https://techgeekbytes.blogspot.com/
https://techgeekbytes.blogspot.com/
https://techgeekbytes.blogspot.com/2017/06/architecture-hdfs-file-system.html
true
8751912191394201619
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy