Filesystem

This lesson discusses the filesystem and its related entities.

Filesystem

A file system is a mechanism of storing, organizing and retrieving data on a storage medium. The fundamental unit of a file system is a file, logically related named piece of data. The structure and rules used to manage files and their names comprises a file system.

Types of file systems

There are different types of file systems each exhibiting different characteristics with respect to reliability, security, and performance. Examples are:

  • Disk File Systems
  • Tape File Systems
  • Network File Systems
  • Special Purpose File Systems

OS and file systems

Operating systems generally support more than one file system. For instance, Apple’s MacOS uses APFS(Apple File System), which replaced an earlier file system called HFS+. Windows supports FAT, its variants and NTFS file systems, and Linux supports EXT family of file systems.

Distributed file systems

In the context of Hadoop, we use distributed file systems (DFS). A DFS uses the network to send/receive data and creates the illusion of a local file system for clients. Access to files is provided from multiple hosts. Files may be stored on a central server or across multiple servers. There are several distributed file systems in operation today, including Lustre, Google File System (GFS), and Andrew File System.

Next we’ll examine the building block of a filesystem.

Disk block

A disk block is the smallest unit writable by a disk or file system. Everything a file system does is composed of operations executed on disk blocks. However, file systems don’t write individual disk blocks. Rather they read/write a couple of blocks together for efficiency. This abstraction over the physical disk blocks is called the filesystem block. A file system block is always the same size as or larger (in integer multiples) than the disk block size.

File metadata

The name of a file is metadata because it is information about the file not in the stream of bytes making up the file. There are several other pieces of metadata about a file like the owner, security access controls, date of last modification, creation time, and size. Generally the file system stores file metadata in an i-node or index-node. Examples of information stored in i-nodes are the last access time of the file, type, creator, version number, and reference to the directory that contains the file.

In Hadoop, the file’s metadata is kept separate from the file data, as we’ll later learn.

i-Node

In a traditional filesystems (like Unix or its variants), the i-node (index node) is a data structure in the filesystem that stores metadata information and information about data’s location on the storage media. An i-node refers to the contents of the file by tracking a list of blocks on the disk that belong to this file. A file appears as a continuous stream of bytes at higher levels; the blocks that contain the file data may not be contiguous on disk. An i-node contains the information the file system uses to map from a logical position in a file (for example, byte offset 11,239) to a physical position on disk.

We’ll see a parallel of the i-node in HDFS too.

Get hands-on with 1200+ tech skills courses.