External Sort (External Merge-Sort)

Let’s discuss the external sort and its derivatives.

We'll cover the following

Introduction

When we have to sort a huge amount of data (data large enough that it can’t be all loaded into RAM), we use external sort. An example is the external merge sort algorithm.

External sort algorithm

  • Data is first picked in chunks and sorted in memory. The data is then written back to storage once it has been sorted. Merge sort is used to sort the data in parts. Now, we put these sorted chunks together to make the final sorted data.

  • After that, we establish data queues that will read from the sorted chunks. We’ll have a separate queue for each portion. We’ll remove an element from these queues that are in charge of reading pieces that have been sorted. For example, assume we have k separate sorted data chunks, each of length m.

  • In the next step we’ll use a min-heap to take data from each of these queues as input. Each queue will have one element taken from it. Finally, we’ll take the heap’s lowest value and append it to the final output.

  • The queue from which this minimal value element was added to the heap will be dequeued once again, and another element from this queue will be added to the heap.

  • Finally, after a queue’s data has been consumed, that queue is deleted from the input list. Eventually, a sorted data set will emerge from the heap.

Note: We may improve this process even more by including an output buffer, which will store data from heap and only perform a limited number of write operations on the final disk space.

The illustration below shows how the external sort algorithm works.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.