The Model of Computation
Explore the word-RAM model used to analyze data structure operations' running times and memory usage. Understand key concepts of correctness, time, and space complexity. Learn the differences among worst-case, amortized, and expected running times through practical examples, enhancing your ability to evaluate data structure performance.
We can analyze the theoretical running times of operations on the data structures. To do this precisely, we need a mathematical model of computation. For this, we use the -bit word-RAM model.
Computer memory
RAM stands for Random Access Machine. In this model, we have access to a random access memory consisting of cells, each of which stores a w–bit word. This implies that a memory cell can represent, for example, any integer in the set .
In the word–RAM model, basic operations on words take constant time. This includes arithmetic operations , comparisons , and bitwise boolean operations (bitwise–AND, OR, and exclusive–OR).
Any cell can be read or written in constant time. A computer’s memory is managed by a memory management system from which we can allocate or deallocate a block of memory of any size we would like. Allocating a block of memory of size takes time and returns a reference (a pointer) to the newly-allocated memory block. This reference is small enough to be represented by a single word.
The word–size is a very important parameter of this model. The only assumption we will make about is the lower–bound , where is the number of elements stored in any of our data structures. This is a fairly modest assumption, since otherwise a word is not even big enough to count the number of elements stored in the data structure.
Space is measured in words, so that when we talk about the amount of space used by a data structure, we are referring to the number of words of memory used by the structure. All of our data structures store values of a generic type , and we assume an element of type occupies one word of memory. (In reality, we are storing references to objects of type , and these references occupy only one word of memory.)
The data structures normally don’t use any special tricks that are not implementable on the JVM and most other architectures.
Correctness, time complexity, and space complexity
When studying the performance of a data structure, there are three things that matter most:
- Correctness: The data structure should correctly implement its interface.
- Time complexity: The running times of operations on the data structure should be as small as possible.
- Space complexity: The data structure should use as little memory as possible.
In this introductory text, we will take correctness as a given; we won’t consider data structures that give incorrect answers to queries or don’t perform updates properly. We will, however, see data structures that make an extra effort to keep space usage to a minimum. This won’t usually affect the (asymptotic) running times of operations, but can make the data structures a little slower in practice.
Types of running time in data structures
When studying running times in the context of data structures we tend to come across three different kinds of running time guarantees:
-
Worst-case running times: These are the strongest kind of running time guarantees. If a data structure operation has a worst–case running time of , then one of these operations never takes longer than time.
-
Amortized running times: If we say that the amortized running time of an operation in a data structure is , then this means that the cost of a typical operation is at most . More precisely, if a data structure has an amortized running time of , then a sequence of operations takes at most time. Some individual operations may take more than time but the average, over the entire sequence of operations, is at most .
-
Expected running times: If we say that the expected running time of an operation on a data structure is , this means that the actual running time is a random variable and the expected value of this random variable is at most . The randomization here is with respect to random choices made by the data structure.
To understand the difference between worst–case, amortized, and expected running times, it helps to consider a financial example. Consider the cost of buying a house.
Worst-case versus amortized cost
Suppose that a home costs . In order to buy this home, we might get a month ( year) mortgage with monthly payments of per month. In this case, the worst-case monthly cost of paying this mortgage is per month.
If we have enough cash on hand, we might choose to buy the house outright, with one payment of . In this case, over a period of years, the amortized monthly cost of buying this house is
This is much less than the per month we would have to pay if we took out a mortgage.
Worst-case versus expected cost
Next, consider the issue of fire insurance on our home. By studying hundreds of thousands of cases, insurance companies have determined that the expected amount of fire damage caused to a home like ours is per month. This is a very small number, since most homes never have fires, a few homes may have some small fires that cause a bit of smoke damage, and a tiny number of homes burn right to their foundations. Based on this information, the insurance company charges per month for fire insurance.
Now it’s decision time. Should we pay the worst-case monthly cost for fire insurance, or should we gamble and self–insure at an expected cost of per month? Clearly, the per month costs less in expectation, but we have to be able to accept the possibility that the actual cost may be much higher. In the unlikely event that the entire house burns down, the actual cost will be .
These financial examples also offer insight into why we sometimes settle for an amortized or expected running time over a worst-case running time. It is often possible to get a lower expected or amortized running time than a worst-case running time. At the very least, it is very often possible to get a much simpler data structure if one is willing to settle for amortized or expected running times.