How to implement linear hashing in Python
Hashing addresses the need to quickly locate or store an item in a collection. Hashing is a method for increasing productivity by effectively filtering the search.
What is hashing?
The process of translating keys and values into a hash table using a hash function is known as hashing. We use hashing for quicker access to elements.
The creation of hash tables is the most common application of hashing. It involves turning a string of characters into a key or a fixed-length value, which typically reflects the longer string but is shorter. Since it takes less time to discover an item using the shorter hashed key than it does to find it using the original value, databases employ hashing to index and retrieve data.
What is hashing used for?
Obtaining data: When looking for objects on an object data map, a hash may be utilized to focus our search.
For instance, developers store data in the form of key and value pairs in hash tables, which may include customer records. The hash code or the integer is then translated to a predetermined size, and the key serves as an input to the hashing method to help identify the contents. The supported function by hash tables includes: insert (key, value) (key, value), get (key) (key), and delete (key) (key).Electronic signatures: Hashing aids in the encryption and decryption of digital signatures need to authenticate message senders and recipients in addition to enabling quick data retrieval. In this case, the digital signature is changed by a hash function before the hashed value, and the signature is transmitted separately to the recipient.
What is linear hashing?
A disk-based index structure called linear hashing, dynamically updates, grows, or reduces one bucket at a time (or you can call it a dynamic hashing scheme). The index is used to locate the record that corresponds to a certain key or to make exact match searches easier.
The method is called linear hashing because the number of buckets grows or shrinks linearly. The maximum number of hashing functions that the system can use at once varies dynamically.
The distinction between linear hashing and other hashing
There is no required directory in linear hashing.
It is capable of handling long overflow chains.
It is more flexible with respect to the timing of buckets splits
It allows us to grow one slot at a time.
Linear hashing terminology
Terminologies in Linear Hashing
Terminology | Description |
h0, h1,h2 | A family of hash functions, where each funtion's range it twice of its predecessor |
N | Initial number of buckets |
d0 | The number of bit to represent N |
Level | Iindicate the number of spilt cycle completed, initially 0 |
Next | Pointer to the next bucket inline to be split |
Linear hashing math properties
Linear Hashing Math Properties
Properties | |
N | = 2d0, for some d0 |
di | d0 + i |
hi(Key) | h(key) mod(2iN) |
As mentioned earlier, linear hashing is flexible with respect to the timing of the bucket split, and it allows us to choose from two splitting criteria.
Load factor: it is calculated by diving the number of entries by the number of buckets multiplied by bucket capacity. We can set the split trigger to any load factor value we want, giving us greater control over the space utilization of the hashing table.
Load factor = 4/3*3 = 40%Overflow: it is very simple to understand, once we detect an overflow after an incursion, we trigger the split.
Linear hashing implementation
Let's walk through an example to see how linear hashing works:
Linear Hashing Implementation
N | = 4 |
Level | = 0 |
Split by Overflow | |
Next | = 0 (the first bucket) |
Insert 37
Since we are at level 0, size level = 0, h0(37) = 100101, di = d0 = 2, then we only need to look at the last 2 bits of H(37) = 01. The table will now look like this:
Inserting 37
h0 | ||||
00 | 32 (100000) | 44 (101100) | 36 (100100) | |
01 | 9 (1001) | 25 (11001) | 5 (0101) | 37 (100101) |
10 | 14 (1110) | 18 (10010) | 10 (1010) | 30 (11110) |
11 | 31 (11111) | 35 (100011) | 7 (0111) | 11 (1011) |
Insert 43
We are still at level 0 therefore, the size level = 0, h0(43) = 101011, di - d0 = 2. We only need to look at the last 2 bits of H(43) = 11. We create a new column and insert an overflow of 43(101011), since the hash table to set to trigger on overflow is now set as split, the table looks like this now:
Inserting 43
h0 | Overflow | ||||
00 | 32 (100000) | 44 (101100) | 36 (100100) | ||
01 | 9 (1001) | 25 (11001) | 5 (0101) | 37 (100101) | |
10 | 14 (1110) | 18 (10010) | 10 (1010) | 30 (11110) | |
11 | 31 (11111) | 35 (100011) | 7 (0111) | 11 (1011) | 43 (101011) |
Let's demonstrate this by splitting:
Splitting
N | = 4 |
Level | = 0 |
Split by Overflow | |
Next | = 1 |
Split is done on the bucket (00) where the pointer points to not the overflow bucket. So we split the next bucket and increment the pointer to the next bucket in the circle.
Now we need to relocate entries originally in the bucket(00) with respect to the new hlevel+1 and we need to look at the last 3 bits of each entry.
Splitting table
h1 | h0 | Overflow | ||||
00 | 32 (100000) | 44 (101100) | 36 (100100) | |||
01 | 9 (1001) | 25 (11001) | 5 (0101) | 37 (100101) | ||
10 | 14 (1110) | 18 (10010) | 10 (1010) | 30 (11110) | ||
11 | 31 (11111) | 35 (100011) | 7 (0111) | 11 (1011) | 43 (101011) | |
100 | 00 | 44 (101100) | 36 (100100) |
Code example
Let's see how it works with an executable code:
#list definitioninit_list = [120 , 111 , 80 , 260 , 118 , 110 , 100 , 97 , 85]HashValues = []#define a function to accept the data from the listdef linear_hash_function(init_list):#another list with a none datatypesecond_list = [None for i in range(10)]for i in init_list:#let's append the values from the second list to the HashTableHashValues.append(i % len(second_list))second_list[i % len(second_list)] = iprint(second_list)print(init_list)print(linear_hash_function)print(linear_hash_function(init_list))
Code explanation
Line 2: We define a list with the name
init_listand add nine values to the list.Line 5: Another list
HashValuesis defined so as to save the final values after the linear hashing is done.Line 8: We define a function with
init_listas the parameter and went ahead to define a local list within the function name, which has a None datatype because we want to handle null datatypes in the list within the range of10.Line 11: We create a
forloop. We append the values from thesecond_listto the HashTable. Then for all values(i) insecond_listfind the modulo and set it equal toi.Line 23: Then, we print the function
linear_hash_function(init_list).
The output:
When we print the output, we get a
Noneoutput because it contains null values or no values at all.
Note:
Noneis not the same as an empty string, 0 or False, it is a datatype of its own (NoneType).
Conclusion
We learned about linear hashing in this Answer. We talked about what hashing is, what linear hashing itself means, how it is different from other hashing techniques, terminologies, and its implementation.
Free Resources