Search⌘ K
AI Features

HDFS in Practice

Explore how to interact with the Hadoop Distributed File System (HDFS) through practical command line operations. Understand how to create directories, upload files, list contents, and retrieve data from HDFS, gaining hands-on experience with file management in Big Data environments.

We'll cover the following...

HDFS in Practice

So far, we covered the theory behind HDFS, its different components and a higher level of understanding on its working. Now we delve into a hands-on exercise interacting with HDFS in a pseudo-distributed cluster running in a docker container. The path environment variable has been set correctly so that the hdfs executable is available. The hdfs command line utility exposes three kinds of commands:

  • Admin commands
  • Client commands
  • Daemon commands

Client commands are the most commonly used. Admin and daemon commands are usually used by Hadoop administrators. Our overview of the commands isn’t a comprehensive study of all the commands and features exposed by the hdfs utility. Rather, it gives the user enough familiarity to find their way for performing various, necessary tasks. Let’s start!

Shell
# The commands are reproduced below for easy copy/pasting into the terminal
# Start the Hadoop cluster
./DataJek/startHadoop.sh
# listing directories and files
hdfs dfs -ls /
# create a directory in hdfs
hdfs dfs -mkdir -p /MyDirectory
# upload a file at the root
hdfs dfs -copyFromLocal /DataJek/helloWorld.txt /MyDirectory
# verify helloWorld.txt is uploaded to root
hdfs dfs -ls /MyDirectory
# check the size of the directory
hdfs dfs -du -s -h -v /MyDirectory
# view the contents of the file
hdfs dfs -cat /MyDirectory/helloWorld.txt
# or
hdfs dfs -text /MyDirectory/helloWorld.txt
# download the previously uploaded file
hdfs dfs -copyToLocal /MyDirectory/helloWorld.txt /Downloads/
# find operation
hdfs dfs -find / -iname "hello*"
Terminal 1
Terminal
Loading...
  1. Start by executing hdfs on the command line. Take a minute to observe the output.

    hdfs
    

    You’ll see a long list of commands and their usage. We’ll examine the dfs subcommand under client commands in this lesson. The dfs subcommand can interact with the filesystem.

  2. We’ll start with listing the root path of HDFS. Execute:

    hdfs dfs -ls /
    

    The output shows that there’s only the tmp directory at the root of HDFS.

  1. Let’s create a directory using the following command:

    hdfs dfs -mkdir -p /MyDirectory
    
  2. Next, we’ll upload a file residing on the local filesystem of the node we are running to HDFS.

    hdfs dfs -copyFromLocal 
    /DataJek/helloWorld.txt /MyDirectory
    

    The -copyFromLocal option instructs the hdfs executable to look for the file on the local filesystem path /DataJek/helloWorld.txt and upload it to HDFS filesystem path /MyDirectory.

  3. Let’s verify that the file has been uploaded.

    hdfs dfs -ls /MyDirectory
    

    Our helloWorld.txt has been successfully uploaded!

  1. Next, we can check the size of our directory MyDirectory as follows:

    hdfs dfs -du -s -h -v /MyDirectory
    
  1. We can view the contents of the helloWorld.txt file as follows:

    hdfs dfs -cat /MyDirectory/helloWorld.txt
    
  1. We can use a different command text to view the contents of the helloWorld.txt file.

    hdfs dfs -text /MyDirectory/helloWorld.txt
    
  1. Next, we’ll learn how to download a file from HDFS to the local filesystem.

    hdfs dfs -copyToLocal /MyDirectory/helloWorld.txt /Downloads/
    
  1. We can also run a find command on the HDFS namespace:

    hdfs dfs -find / -iname "hello*"
    

Namenode Webserver

The Namenode exposes a web UI accessed at the port specified by the property dfs.namenode.http-address which, by default, is set to 9870.

http://localhost:9870/dfshealth.html

A screenshot of the UI appears below:

We reproduced the widget from the Namenode lesson below, in case, you missed interacting with it.

Again, the UI will not load in the widget below. Click on the URL link beside the message “Your app can be found at” or wait for the Firefox message to load “Open Site in New Window”, and click on that. The Namenode UI may be slow to load, so please be patient.

Please login to launch live app!