In this blog post I will explain different HDFS commands to access HDFS which are commonly used while working as a Big Data Developer Training.
- Hadoop provides command line interface to access HDFS.
- Most of the commands similar to UNIX file system commands
[npntraining@centos8 Desktop]$ hdfs -help
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
classpath prints the classpath
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
journalnode run the DFS journalnode
zkfc run the ZK Failover Controller daemon
datanode run a DFS datanode
dfsadmin run a DFS admin client
haadmin run a DFS HA admin client
fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version
List of all HDFS Commands
[naveen@npntraining ~]$ hdfs dfs -help
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
-ls
This command is used for listing the directories and files present under under a specific directory in an HDFS.
Usage : hdfs dfs [generic options] -ls [-d] [-h] [-R] [
[npntraining ~]$ hdfs dfs -ls /user/$USERNAME/hdfs_commands
Note
- -d is used to list the directories as plain files.
- -h is used to print file size in human readable format.
- -R is used to recursively list the content of the directories.
-mkdir
This command is similar to that of Unix mkdir and is used to create a directory in HDFS.
Usage :
hdfs dfs -mkdir [-p] /hdfs-path
Options | Description |
---|---|
-p | Mention not to fail if the directory already exists. |
[npntraining ~]$ hdfs dfs -mkdir /user/$USER/hdfs_commands
Note
- If the directory already exists or if intermediate directories doesn’t exists then it will throws an error. In order to overcome that error we will be using -p (parent), which not only ignores if the directory already exists but also create the intermediate directories if they doesn’t exists
[npntraining ~]$ hdfs dfs -ls /
-cat
This command is used for displaying the contents of a file on the console.
Usage : hdfs dfs [-cat [-ignoreCrc]
Note
- -ignoreCrc option will disable the checksum verification.
-copyFromLocal &
This command is used to copy files from the local file system to the HDFS file system.
Usage
hdfs dfs [generic options] -copyFromLocal [-f] [-p] [-l]
- -f overwrites the destination if it already exists.
- -p preserves access and modification times, ownership and the permissions
- -d : Skip creation of temporary file with the suffix .COPYING.
Using -copyFromLocal
[npntraining ~]$ hdfs dfs -copyFromLocal <localfile_path1> <localfile_path2> /<hdfs-path>
-put
Usage :
hdfs dfs -put [-f] [-p] [-l] [-d] [ — |
- -f overwrites the destination if it already exists.
- -p preserves access and modification times, ownership and the permissions
- -d skips creation of temporary file with the suffix .COPYING.
- -l allows Data Node to lazily persist the file to disk, Forces a replication factor of 1.
[npntraining~]$> hdfs dfs -put file1.txt hdfs://localhost.localdomain:9000/file1.txt
Important Note:
The fundamental difference between -copyFromLocal and -put is the
- -put can take input from stdin.
[npntraining]$> echo "Hello Naveen" | hdfs dfs -put - /file1.txt
-moveFromLocal
This command will move the file from local file system to HDFS. It ensures that the local copy is deleted.
Usage
hdfs dfs -moveFromLocal
[npntraining]$> hdfs dfs -mv /<hdfs-path> /<hdfs-path>
Similar to –copyFromLocal command, but it follows cut and paste approach but within a file system
[npntraining]$ > hdfs dfs -mv file1.txt /file1.txt
-get
This command is used to copy files from HDFS to the local file system.
Usage
hdfs dfs [generic options] -get [-f][-p] [-ignoreCrc] [-Crc]
[npntraining]$> hdfs dfs -get /<hdfs-path> <localfile_path1>
[npntraining]$> hdfs dfs -copyToLocal /<hdfs-path> <localfile_path1>
-copyToLocal
This command is similar to get command except that destination is restricted to the local file system. This command is used to move the file from HDFS to local file system.
Usage
hdfs dfs -copyToLocal [-f] [-p][-ignoreCrc] [-Crc] URI
-rm
This command is used to remove a file or directory from HDFS.
Usage
hdfs dfs -rm [-f] [-r |-R] [-skipTrash] [-safely] URI [URI …]
- –rm option will remove only files but directories can’t be deleted by this command.
- –skipTrash option is used to bypass the trash then it immediately deletes the source.
- –f option is used to mention that if there is no file existing.
- –r option is used to recursively delete directories
- -safely option will require safety confirmation before deleting directory with total number of files greater than hadoop.shell.delete.limit.num.files (in core-site.xml, default: 100). It can be used with -skipTrash to prevent accidental deletion of large directories.
[npntraining]$> hdfs dfs –rm /file.txt
[npntraining]$> hdfs dfs –rmr /directory
-setrep
This command changes the replication factor for the particular file/s or directory recursively.
Usage
hdfs dfs -setrep [-R] [-w] <numReplicas> <path>
- –w option will wait till replication process to complete because replication will normally take long time to complete.
- -R option is accepted for backwards compatibility. It has no effect.
[npntraining]$ > hdfs dfs -setrep 4 /file1.txt
Whenever we change replication factor automatic cluster re-balancing is going to happen.
-touchz
This command is used to create a file of zero length. An error is returned if the file exists with non-zero length.
Usage
hdfs dfs -touchz URI [URI …]
[npntraining]$ > hdfs dfs -touchz /file1.txt
-stat
Print statistics of the file/directory specified in the path with format. Format may contain following options:
-
%b: Filesize in blocks.
-
%F: File type.
-
%g: group name of owner.
-
%n: name of the file.
-
%o: block size occupied by the file/directory.
-
%r: number of replicas.
-
%u: username of owner.
[npntraining]$ > hdfs dfs -stat "%n %o %u %g %r" /file.txt
-getmerge
Takes a source directory files as input and concatenates files in src into the destination local file.
[npntraining]$ > hdfs dfs -getmerge /<hdfsdir> file.txt
-appendToFile
Appends the content of local files to HDFS files
[npntraining]$ > hdfs dfs -appendToFile /<local_file> /<local_file> /<hdfs-file>
-text
-test [-ezd]
The -test is used for file test operations
- –e: If the path exists, returns 0.
- –z: If the file is zero length, returns 0
- –d:If the path is directory, return 0;
[npntraining]$ > hdfs dfs -test -e /file.txt
[npntraining]$ > echo $?
-du
This command is used to check disk usage
hdfs dfs -du -h /